Compositional Task Representations for Large Language Models

Authors: NAN SHAO, Zefan Cai, Hanwei xu, Chonghua Liao, Yanan Zheng, Zhilin Yang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive quantitative experiments to validate the performance of cross-task generalization of CTR. We mainly consider two settings, the zero-label setting and the few-shot setting. Aside from quantitative experiments, we perform qualitative analysis to understand the cross-task generalization ability by investigating the association between the discrete latent codes and the tasks.
Researcher Affiliation Collaboration Nan Shao 1, Zefan Cai 2, Hanwei Xu1, Chonghua Liao3, Yanan Zheng3, Zhilin Yang 3451 1Recurrent AI, 2Beijing Jiaotong University, 3Tsinghua University 4Shanghai Artificial Intelligence Laboratory, 5Shanghai Qi Zhi Institute
Pseudocode No No structured pseudocode or algorithm blocks were found. The methods are described in narrative text within Section 3.3.
Open Source Code Yes The code will be available at https://github.com/shaonan1993/CTR.
Open Datasets Yes We follow the T0 benchmark. The training part consists of 39 tasks of 8 task types... We follow T0 to use the accuracy on the validation split of test tasks as the metric.
Dataset Splits Yes We follow T0 to use the accuracy on the validation split of test tasks as the metric. We use this set of labeled data as our validation set to search for a high-performing code.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types with specifications) used for running the experiments were provided.
Software Dependencies No No specific software dependencies including version numbers (e.g., library names with versions) were explicitly stated.
Experiment Setup Yes For the first stage of training where the LLM is frozen, we use the Adam optimizer with a learning rate of 1e-2, a decay rate of 0.1, and a batch size of 2048. We use a codebook embedding dimension of 1024, which is the same as the hidden dimension. The CTR length is set at 10 and each position can be assigned values ranging from 0 to 127; i.e., the codebook size is 128. The hyperparameter β is set at 0.1. For the second training phase where all parameters are updated, we use the Adam optimizer with a learning rate of 1e-4 and a batch size of 1024.