From Instance Training to Instruction Learning: Task Adapters Generation from Instructions
Authors: Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate TAGI on the Super-Natural Instructions and P3 datasets. The experimental results demonstrate that TAGI can match or even outperform traditional meta-trained models and other hypernetwork models, while significantly reducing computational requirements. |
| Researcher Affiliation | Collaboration | Huanxuan Liao1,2, Shizhu He1,2 , Yao Xu1,2, Yuanzhe Zhang1,2, Yanchao Hao3, Shengping Liu4, Kang Liu1,2, Jun Zhao1,2 1 The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3 Platform and Content Group, Tencent, Beijing, China 4 Unisound, Beijing, China |
| Pseudocode | Yes | Specifically, the complete process is shown in Figure 2, we initially train the Lo RA modules 3.2.2 on various upstream tasks (seen tasks) with task datasets of meta-train (Ttrain). Specifically, for N distinct upstream tasks, we independently train N Lo RA modules, with each module denoted as i for task Ti T , presumed to represent the optimal model for its respective task. Subsequently, TAGI is committed to building proprietary models for downstream tasks (unseen tasks). Its training process is bifurcated into two primary phases: hypernetwork pretraining 3.2.3 and hypernetwork finetuning 3.2.4 which encompasses distillation and alignment. |
| Open Source Code | Yes | Our code will be available at https://github.com/Xnhyacinth/TAGI. |
| Open Datasets | Yes | We evaluate TAGI on the Super-Natural Instructions and P3 datasets. Super-Natural Instructions: https://github.com/allenai/ natural-instructions P3: https://huggingface.co/datasets/bigscience/P3 |
| Dataset Splits | Yes | We categorize these tasks into three distinct non-overlapping groups for validating out-of-distribution generalization: meta-train (Ttrain), meta-valid (Tvalid), and meta-test (Ttest), assuming all tasks adhere to a text-to-text format. For SNI, we adhered to the default settings [13; 37], which include 100 examples per task for both the training and test splits. Table 5: Number of samples in given splits for each dataset. |
| Hardware Specification | Yes | All experiments were conducted on 4 A100 NVIDIA GPUs, each equipped with 80GB of memory, and eight A6000 NVIDIA GPUs with 48GB of memory. |
| Software Dependencies | Yes | Our implementations are based on huggingface transformers v4.23.1 [40] using Py Torch v1.13.1 [26] and deepspeed8 v0.10.0. |
| Experiment Setup | Yes | The complete stable hyperparameter set used for training runs can be found in Table 6. Table 6 includes Max Input Len, Max Output Len, Optimizer, Learning Rate, precision, # Training Steps, # Warmup Steps, Batch Size, Gradient Accumulation, and Lo RA Rank. |