Why Larger Language Models Do In-context Learning Differently?

Authors: Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Preliminary experimental results on large base and chat models provide positive support for our analysis.
Researcher Affiliation Academia 1University of Wisconsin-Madison, 2The University of Hong Kong.
Pseudocode No The paper contains mathematical derivations and proofs but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about providing open-source code for its methodology or a link to a code repository.
Open Datasets Yes We conduct experiments on five prevalent NLP tasks, leveraging datasets from GLUE (Wang et al., 2018) tasks and Subj (Conneau & Kiela, 2018).
Dataset Splits No The paper mentions using "M = 16 in-context exemplars" but does not provide specific training, validation, and test dataset splits for the datasets used (GLUE, Subj).
Hardware Specification No The paper does not provide any specific hardware details such as CPU or GPU models, or memory specifications, used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We follow the prior work on in-context learning (Wei et al., 2023b) and use M = 16 in-context exemplars. ... Accuracy is calculated over 1000 evaluation prompts per dataset and over 5 runs with different random seeds for each evaluation... we introduce noise by inverting an escalating percentage of in-context example labels.