Skip to content

Latest commit

 

History

History
48 lines (39 loc) · 1.96 KB

other_results.md

File metadata and controls

48 lines (39 loc) · 1.96 KB

GLUE results

We also evalute the language understanding performance of Uni-Perceiver on GLUE benchmarks. The results are listed as below.

Dataset MNLI QNLI QQP RTE SST-2 MRPC CoLA
MetricAccAccF1AccAccF1Acc
Uni-PerceiverBASE 79.787.386.7 71.1 89.3 86.0 43.1
Uni-Perceiver-MoEBASE 81.588.287.8 75.890.9 87.1 52.2
Uni-PerceiverLARGE 82.589.287.7 73.791.2 90.252.0
Uni-Perceiver-MoELARGE 85.791.989.5 78.493.4 91.257.4

  • All fine-tuning experiments are performed on 1 GPU.

  • We use the hyper-parameters for GLUE tasks from fair-seq

Model MNLI QNLI QQP RTE SST-2 MRPC CoLA STS-B
--num-classes 3 2 2 2 2 2 2 1
--lr 5e-6 1e-5 1e-5 1e-5 5e-6 2e-5 2e-5 2e-5
bsz 128 32 32 32 128 64 64 32
--total-num-update 30968 33112 113272 1018 5233 1148 1334 1799
--warmup-updates 1858 1986 6796 61 314 68 80 107
--warmup-updates 1858 1986 6796 61 314 68 80 107
  • Following RoBerta, we finetune RTE, STS and MRPC starting from the MNLI single-task model, rather than the baseline pretrained model.