reproducibilityindex.ai

Compositional Exemplars for In-context Learning

Authors: Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, Lingpeng Kong

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing. Extensive experiments demonstrate not only the state-of-the-art performance but also the transferability and compositionality of CEIL, shedding new light on in-context learning.
Researcher Affiliation	Collaboration	1Department of Computer Science, The University of Hong Kong 2Shark-NLP, Shanghai Artificial Intelligence Laboratory.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. The methods are described in narrative text and figures.
Open Source Code	Yes	Our code is released at https://github.com/HKUNLP/icl-ceil.
Open Datasets	Yes	All the datasets and tasks are listed in Table 1. These datasets involve different task formulations, thereby allowing for extensive evaluations of CEIL in varying scenarios. Prompts and examples of each dataset are shown in Appendix A.1.
Dataset Splits	Yes	Final results are reported on the validation set as the test set is private for some datasets.
Hardware Specification	Yes	We use Adam optimizer (Kingma & Ba, 2015) with batch size 128 and learning rate 1e5, and run training for 30 epochs on two NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions software like GPT-Neo, GPT2-XL, Codex, BERT, and Huggingface Transformers but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch).
Experiment Setup	Yes	The number of in-context examples is set to 50, and we truncate it based on the maximum context size for different LMs (e.g., 1,024 for GPT2-XL, 2,048 for GPT-Neo, and 8,0013 for Codex) on each task. ... We use Adam optimizer (Kingma & Ba, 2015) with batch size 128 and learning rate 1e5, and run training for 30 epochs on two NVIDIA A100 GPUs. For each task, we search the trade-off factor λ in {0.01, 0.05, 0.1}.