Why Larger Language Models Do In-context Learning Differently?
Authors: Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Preliminary experimental results on large base and chat models provide positive support for our analysis. |
| Researcher Affiliation | Academia | 1University of Wisconsin-Madison, 2The University of Hong Kong. |
| Pseudocode | No | The paper contains mathematical derivations and proofs but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for its methodology or a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on five prevalent NLP tasks, leveraging datasets from GLUE (Wang et al., 2018) tasks and Subj (Conneau & Kiela, 2018). |
| Dataset Splits | No | The paper mentions using "M = 16 in-context exemplars" but does not provide specific training, validation, and test dataset splits for the datasets used (GLUE, Subj). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU or GPU models, or memory specifications, used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We follow the prior work on in-context learning (Wei et al., 2023b) and use M = 16 in-context exemplars. ... Accuracy is calculated over 1000 evaluation prompts per dataset and over 5 runs with different random seeds for each evaluation... we introduce noise by inverting an escalating percentage of in-context example labels. |