Compositional Exemplars for In-context Learning

Authors: Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, Lingpeng Kong

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing. Extensive experiments demonstrate not only the state-of-the-art performance but also the transferability and compositionality of CEIL, shedding new light on in-context learning.
Researcher Affiliation Collaboration 1Department of Computer Science, The University of Hong Kong 2Shark-NLP, Shanghai Artificial Intelligence Laboratory.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. The methods are described in narrative text and figures.
Open Source Code Yes Our code is released at https://github.com/HKUNLP/icl-ceil.
Open Datasets Yes All the datasets and tasks are listed in Table 1. These datasets involve different task formulations, thereby allowing for extensive evaluations of CEIL in varying scenarios. Prompts and examples of each dataset are shown in Appendix A.1.
Dataset Splits Yes Final results are reported on the validation set as the test set is private for some datasets.
Hardware Specification Yes We use Adam optimizer (Kingma & Ba, 2015) with batch size 128 and learning rate 1e5, and run training for 30 epochs on two NVIDIA A100 GPUs.
Software Dependencies No The paper mentions software like GPT-Neo, GPT2-XL, Codex, BERT, and Huggingface Transformers but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch).
Experiment Setup Yes The number of in-context examples is set to 50, and we truncate it based on the maximum context size for different LMs (e.g., 1,024 for GPT2-XL, 2,048 for GPT-Neo, and 8,0013 for Codex) on each task. ... We use Adam optimizer (Kingma & Ba, 2015) with batch size 128 and learning rate 1e5, and run training for 30 epochs on two NVIDIA A100 GPUs. For each task, we search the trade-off factor λ in {0.01, 0.05, 0.1}.