Compositional Task Representations for Large Language Models
Authors: NAN SHAO, Zefan Cai, Hanwei xu, Chonghua Liao, Yanan Zheng, Zhilin Yang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive quantitative experiments to validate the performance of cross-task generalization of CTR. We mainly consider two settings, the zero-label setting and the few-shot setting. Aside from quantitative experiments, we perform qualitative analysis to understand the cross-task generalization ability by investigating the association between the discrete latent codes and the tasks. |
| Researcher Affiliation | Collaboration | Nan Shao 1, Zefan Cai 2, Hanwei Xu1, Chonghua Liao3, Yanan Zheng3, Zhilin Yang 3451 1Recurrent AI, 2Beijing Jiaotong University, 3Tsinghua University 4Shanghai Artificial Intelligence Laboratory, 5Shanghai Qi Zhi Institute |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. The methods are described in narrative text within Section 3.3. |
| Open Source Code | Yes | The code will be available at https://github.com/shaonan1993/CTR. |
| Open Datasets | Yes | We follow the T0 benchmark. The training part consists of 39 tasks of 8 task types... We follow T0 to use the accuracy on the validation split of test tasks as the metric. |
| Dataset Splits | Yes | We follow T0 to use the accuracy on the validation split of test tasks as the metric. We use this set of labeled data as our validation set to search for a high-performing code. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types with specifications) used for running the experiments were provided. |
| Software Dependencies | No | No specific software dependencies including version numbers (e.g., library names with versions) were explicitly stated. |
| Experiment Setup | Yes | For the first stage of training where the LLM is frozen, we use the Adam optimizer with a learning rate of 1e-2, a decay rate of 0.1, and a batch size of 2048. We use a codebook embedding dimension of 1024, which is the same as the hidden dimension. The CTR length is set at 10 and each position can be assigned values ranging from 0 to 127; i.e., the codebook size is 128. The hyperparameter β is set at 0.1. For the second training phase where all parameters are updated, we use the Adam optimizer with a learning rate of 1e-4 and a batch size of 1024. |