Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Authors: Yuheng Zha, Yichi Yang, Ruichen Li, Zhiting Hu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive experiments show the model s efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets |
| Researcher Affiliation | Academia | Yuheng Zha Yichi Yang Ruichen Li Zhiting Hu UC San Diego {yzha, yiy067, rul014, zhh019}@ucsd.edu |
| Pseudocode | No | No pseudocode or algorithm block found in the paper. |
| Open Source Code | Yes | Code is made available at https://github.com/yuh-zha/Align |
| Open Datasets | Yes | In total, we collect 5.9M examples from 28 datasets to train our alignment model ALIGN. We include more details of our training setup and data in Appendix C. Specifically, we use Ro BERTa [12] as a lightweight backbone language model, and attach three individual linear layers to predict the three types of alignment outputs, Pr(ybin), Pr(y3way), and yreg, respectively. (Table 8 lists: SNLI [40], Multi NLI [7], SQu AD v2 [29] among others) |
| Dataset Splits | Yes | We use the validation split of SQu AD v2 and Simplified NQ as their test splits are not publicly available. For the combination of GPT-3.5 + Verifier and Simplified NQ, we also report the exact match and F1 scores with the best unanswerable threshold selected on the Simplified NQ validation split in parenthesis. We use the SQu AD v2 validation split to find the best unanswerable threshold that maximizes the F1 score. |
| Hardware Specification | Yes | GPU 2 3090 (for ALIGN-base) and 4 A5000 (for ALIGN-large) (from Table 7) |
| Software Dependencies | No | The paper mentions 'RoBERTa' and 'Adam W' optimizer but does not provide specific version numbers for software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | For the experiments in Section 4, we train ALIGN for 3 epochs with a batch size of 32, following common practice [12, 16]. Other hyperparameters are listed in Table 7. (Table 7 lists: Batch Size 32, Epochs 3, Learning Rate 1e-5, Weight Decay 0.1, Adam ε 1e-6, Warmup Ratio 0.06) |