reproducibilityindex.ai

Text Alignment Is An Efficient Unified Model for Massive NLP Tasks

Authors: Yuheng Zha, Yichi Yang, Ruichen Li, Zhiting Hu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	extensive experiments show the model s efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets
Researcher Affiliation	Academia	Yuheng Zha Yichi Yang Ruichen Li Zhiting Hu UC San Diego {yzha, yiy067, rul014, zhh019}@ucsd.edu
Pseudocode	No	No pseudocode or algorithm block found in the paper.
Open Source Code	Yes	Code is made available at https://github.com/yuh-zha/Align
Open Datasets	Yes	In total, we collect 5.9M examples from 28 datasets to train our alignment model ALIGN. We include more details of our training setup and data in Appendix C. Specifically, we use Ro BERTa [12] as a lightweight backbone language model, and attach three individual linear layers to predict the three types of alignment outputs, Pr(ybin), Pr(y3way), and yreg, respectively. (Table 8 lists: SNLI [40], Multi NLI [7], SQu AD v2 [29] among others)
Dataset Splits	Yes	We use the validation split of SQu AD v2 and Simplified NQ as their test splits are not publicly available. For the combination of GPT-3.5 + Verifier and Simplified NQ, we also report the exact match and F1 scores with the best unanswerable threshold selected on the Simplified NQ validation split in parenthesis. We use the SQu AD v2 validation split to find the best unanswerable threshold that maximizes the F1 score.
Hardware Specification	Yes	GPU 2 3090 (for ALIGN-base) and 4 A5000 (for ALIGN-large) (from Table 7)
Software Dependencies	No	The paper mentions 'RoBERTa' and 'Adam W' optimizer but does not provide specific version numbers for software dependencies like programming languages or libraries.
Experiment Setup	Yes	For the experiments in Section 4, we train ALIGN for 3 epochs with a batch size of 32, following common practice [12, 16]. Other hyperparameters are listed in Table 7. (Table 7 lists: Batch Size 32, Epochs 3, Learning Rate 1e-5, Weight Decay 0.1, Adam ε 1e-6, Warmup Ratio 0.06)