Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Authors: Jannik Kossen, Yarin Gal, Tom Rainforth
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that ICL predictions almost always depend on in-context labels and that ICL can learn truly novel tasks in-context. |
| Researcher Affiliation | Academia | 1 OATML, Department of Computer Science, University of Oxford 2 Department of Statistics, University of Oxford |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the code to reproduce our results at the following repository: github.com/jlko/in context learning. |
| Open Datasets | Yes | We evaluate on SST-2 (Socher et al., 2013), Subjective (Wang & Manning, 2012), Financial Phrasebank (Malo et al., 2014), Hate Speech (de Gibert et al., 2018), AG News (Zhang et al., 2015), Medical Questions Pairs (MQP) (Mc Creery et al., 2020), as well as Microsoft Research Paraphrase Corpus (MRPC) (Dolan & Brockett, 2005), Recognizing Textual Entailment (RTE) (Dagan et al., 2005), and Winograd Schema Challenge (WNLI) (Levesque et al., 2012) from GLUE (Wang et al., 2019). |
| Dataset Splits | No | The paper mentions using 'training set' and 'test set' from existing datasets but does not explicitly provide specific percentages, sample counts, or citations to predefined splits for these datasets within the text. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'Hugging Face Python library (Wolf et al., 2020) and Py Torch (Paszke et al., 2019)' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We use the following simple templates to format the in-context examples. For SST-2, Subjectivity, Financial Phrasebank, Hate Speech, and our author identification task, we use the following line of Python code to format each input example: f"Sentence: {sentence} \n Answer: {label}\n\n". |