reproducibilityindex.ai

Data Distributional Properties Drive Emergent In-Context Learning in Transformers

Authors: Stephanie Chan, Adam Santoro, Andrew Lampinen, Jane Wang, Aaditya Singh, Pierre Richemond, James McClelland, Felix Hill

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we experimentally manipulated the distributional properties of the training data and measured the effects on in-context few-shot learning. We performed our experiments over data sequences sampled from a standard image-based few-shot dataset (the Omniglot dataset; Lake et al., 2019).
Researcher Affiliation	Collaboration	Stephanie C.Y. Chan Adam Santoro Andrew K. Lampinen Jane X. Wang Aaditya K. Singh University College London Pierre H. Richemond James L. Mc Clelland Deep Mind, Stanford University Felix Hill Deep Mind
Pseudocode	No	The paper describes its methods and experimental setup in narrative text and diagrams (e.g., Figure 1) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at: https://github.com/deepmind/emergent_in_context_learning
Open Datasets	Yes	To investigate the factors that lead to in-context few-shot learning, we created training and evaluation sequences using the Omniglot dataset (Lake et al., 2019, MIT License), a standard image-label dataset for few-shot learning.
Dataset Splits	Yes	We evaluated trained models on two types of sequences, to measure (1) in-context learning and (2) in-weights learning.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware specifications (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions the transformer and recurrent models.
Software Dependencies	No	The paper mentions using a ResNet for image embedding and a causal transformer model, citing relevant papers, but does not provide specific software version numbers for libraries or frameworks (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication.
Experiment Setup	Yes	Unless stated otherwise, we used a transformer with 12 layers and embedding size 64. The model was trained on a softmax cross-entropy loss on the prediction for the ﬁnal (query) image.