reproducibilityindex.ai

Dual Operating Modes of In-Context Learning

Authors: Ziqian Lin, Kangwook Lee

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We corroborate our analysis and predictions with extensive experiments with Transformers and LLMs.
Researcher Affiliation	Academia	1Department of Computer Science, University of Wisconsin Madison, Madison, Wisconsin, USA 2Department of Electrical & Computer Engineering, University of Wisconsin-Madison, Madison, Wisconsin, USA.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at: https: //github.com/UW-Madison-Lee-Lab/ Dual_Operating_Modes_of_ICL.
Open Datasets	Yes	The classification task adopts five datasets including (i) glue-mrpc (Dolan & Brockett, 2005), (ii) glue-rte (Dagan et al., 2005), (iii) tweet eval-hate (Barbieri et al., 2020), (iv) sick (Marelli et al., 2014), and (v) poem-sentiment (Sheng & Uthus, 2020).
Dataset Splits	No	The paper describes using in-context examples for language models and evaluates ICL performance, but it does not specify explicit training, validation, or test dataset splits in the traditional machine learning sense for its experiments.
Hardware Specification	Yes	We perform inference on large models with 8 H100 with the package vllm7.
Software Dependencies	No	The paper mentions software like 'GPT2Model from the package Transformers supported by Hugging Face', 'Adam W', 'Git Hub code released by Min et al. (2022)', and 'vllm' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We use a 10-layer, 8-head Transformer decoder with 1024-dimensional feedforward layers, and the input dimension is set to d, equal to the dimension of x. We train the model over three epochs, each consisting of 10,000 batches, with every batch containing 256 samples. We use Adam W (Loshchilov & Hutter, 2019) as the optimizer with weight decay as 0.00001 and set the learning rate to 0.00001.