From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

Authors: Huanxuan Liao, Shizhu He, Yao Xu, Yuanzhe Zhang, Yanchao Hao, Shengping Liu, Kang Liu, Jun Zhao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate TAGI on the Super-Natural Instructions and P3 datasets. The experimental results demonstrate that TAGI can match or even outperform traditional meta-trained models and other hypernetwork models, while significantly reducing computational requirements.
Researcher Affiliation Collaboration Huanxuan Liao1,2, Shizhu He1,2 , Yao Xu1,2, Yuanzhe Zhang1,2, Yanchao Hao3, Shengping Liu4, Kang Liu1,2, Jun Zhao1,2 1 The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3 Platform and Content Group, Tencent, Beijing, China 4 Unisound, Beijing, China
Pseudocode Yes Specifically, the complete process is shown in Figure 2, we initially train the Lo RA modules 3.2.2 on various upstream tasks (seen tasks) with task datasets of meta-train (Ttrain). Specifically, for N distinct upstream tasks, we independently train N Lo RA modules, with each module denoted as i for task Ti T , presumed to represent the optimal model for its respective task. Subsequently, TAGI is committed to building proprietary models for downstream tasks (unseen tasks). Its training process is bifurcated into two primary phases: hypernetwork pretraining 3.2.3 and hypernetwork finetuning 3.2.4 which encompasses distillation and alignment.
Open Source Code Yes Our code will be available at https://github.com/Xnhyacinth/TAGI.
Open Datasets Yes We evaluate TAGI on the Super-Natural Instructions and P3 datasets. Super-Natural Instructions: https://github.com/allenai/ natural-instructions P3: https://huggingface.co/datasets/bigscience/P3
Dataset Splits Yes We categorize these tasks into three distinct non-overlapping groups for validating out-of-distribution generalization: meta-train (Ttrain), meta-valid (Tvalid), and meta-test (Ttest), assuming all tasks adhere to a text-to-text format. For SNI, we adhered to the default settings [13; 37], which include 100 examples per task for both the training and test splits. Table 5: Number of samples in given splits for each dataset.
Hardware Specification Yes All experiments were conducted on 4 A100 NVIDIA GPUs, each equipped with 80GB of memory, and eight A6000 NVIDIA GPUs with 48GB of memory.
Software Dependencies Yes Our implementations are based on huggingface transformers v4.23.1 [40] using Py Torch v1.13.1 [26] and deepspeed8 v0.10.0.
Experiment Setup Yes The complete stable hyperparameter set used for training runs can be found in Table 6. Table 6 includes Max Input Len, Max Output Len, Optimizer, Learning Rate, precision, # Training Steps, # Warmup Steps, Batch Size, Gradient Accumulation, and Lo RA Rank.