Compositional Exemplars for In-context Learning
Authors: Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, Lingpeng Kong
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing. Extensive experiments demonstrate not only the state-of-the-art performance but also the transferability and compositionality of CEIL, shedding new light on in-context learning. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, The University of Hong Kong 2Shark-NLP, Shanghai Artificial Intelligence Laboratory. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. The methods are described in narrative text and figures. |
| Open Source Code | Yes | Our code is released at https://github.com/HKUNLP/icl-ceil. |
| Open Datasets | Yes | All the datasets and tasks are listed in Table 1. These datasets involve different task formulations, thereby allowing for extensive evaluations of CEIL in varying scenarios. Prompts and examples of each dataset are shown in Appendix A.1. |
| Dataset Splits | Yes | Final results are reported on the validation set as the test set is private for some datasets. |
| Hardware Specification | Yes | We use Adam optimizer (Kingma & Ba, 2015) with batch size 128 and learning rate 1e5, and run training for 30 epochs on two NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions software like GPT-Neo, GPT2-XL, Codex, BERT, and Huggingface Transformers but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch). |
| Experiment Setup | Yes | The number of in-context examples is set to 50, and we truncate it based on the maximum context size for different LMs (e.g., 1,024 for GPT2-XL, 2,048 for GPT-Neo, and 8,0013 for Codex) on each task. ... We use Adam optimizer (Kingma & Ba, 2015) with batch size 128 and learning rate 1e5, and run training for 30 epochs on two NVIDIA A100 GPUs. For each task, we search the trade-off factor λ in {0.01, 0.05, 0.1}. |