reproducibilityindex.ai

Why Larger Language Models Do In-context Learning Differently?

Authors: Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Preliminary experimental results on large base and chat models provide positive support for our analysis.
Researcher Affiliation	Academia	1University of Wisconsin-Madison, 2The University of Hong Kong.
Pseudocode	No	The paper contains mathematical derivations and proofs but no explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about providing open-source code for its methodology or a link to a code repository.
Open Datasets	Yes	We conduct experiments on five prevalent NLP tasks, leveraging datasets from GLUE (Wang et al., 2018) tasks and Subj (Conneau & Kiela, 2018).
Dataset Splits	No	The paper mentions using "M = 16 in-context exemplars" but does not provide specific training, validation, and test dataset splits for the datasets used (GLUE, Subj).
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU or GPU models, or memory specifications, used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We follow the prior work on in-context learning (Wei et al., 2023b) and use M = 16 in-context exemplars. ... Accuracy is calculated over 1000 evaluation prompts per dataset and over 5 runs with different random seeds for each evaluation... we introduce noise by inverting an escalating percentage of in-context example labels.