Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens
Authors: Ruifeng Ren, Yong Liu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, experiments are designed to support our findings. |
| Researcher Affiliation | Academia | Ruifeng Ren Gaoling School of Artificial Intelligence Renmin University of China Beijing, China EMAIL Yong Liu Gaoling School of Artificial Intelligence Renmin University of China Beijing, China EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Answer: [Yes] Justification: We have provided our code and instructions in the supplemental material. |
| Open Datasets | Yes | We choose the BERT-base-uncased model (can be downloaded from Huggingface library[Wolf, 2019], hereafter referred to as BERT[Kenton and Toutanova, 2019]) to validate the effectiveness of modifications to the attention mechanism and select four relatively smaller GLUE datasets (Co LA, MRPC, STS-B, RTE) [Wang, 2018]. |
| Dataset Splits | No | The paper describes the input structure for ICL inference (demonstration tokens and query tokens) and how some tokens are used as 'query tokens' for prediction. However, it does not provide specific dataset split information (e.g., percentages, sample counts, or explicit references to predefined validation splits) for training, validation, and test sets in the typical machine learning experimental setup, particularly for the synthetic tasks where data is generated on the fly. For GLUE datasets, it mentions batch size, learning rate, and epochs but not how the datasets themselves were split for validation. |
| Hardware Specification | Yes | The experiments are completed on a single 24GB NVIDIA Ge Force RTX 3090 and the experiments can be completed within one day. ... All experiments are conducted on a single 24GB NVIDIA Ge Force RTX 3090. |
| Software Dependencies | Yes | We choose the BERT-base-uncased model (can be downloaded from Huggingface library[Wolf, 2019], hereafter referred to as BERT[Kenton and Toutanova, 2019]) |
| Experiment Setup | Yes | We set the dimension of the random features as dr = 100(dt + ds) = 1200 to obtain relatively accurate estimation. ... We choose stochastic gradient descent (SGD) [Amari, 1993] as the optimizer and we set the learning rate to 0.003 for normal and regularized models, while the remaining experiments to 0.005. ... we set the batch size to 32, the learning rate to 2e-5, and the number of epochs to 5 for all datasets. |