In-Context Language Learning: Architectures and Algorithms
Authors: Ekin Akyürek, Bailin Wang, Yoon Kim, Jacob Andreas
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate a diverse set of neural sequence models on regular ICLL tasks. We first show that Transformers significantly outperform neural sequence models with recurrent or convolutional representations on ICLL tasks. |
| Researcher Affiliation | Academia | Ekin Aky urek 1 Bailin Wang 1 Yoon Kim 1 Jacob Andreas 1 1MIT CSAIL. Correspondence to: Ekin Aky urek <akyurek@mit.edu>. |
| Pseudocode | Yes | Algorithm 1 In-context n-gram language model with back-off |
| Open Source Code | Yes | 1Code & data are released at github.com/berlino/seq icl |
| Open Datasets | Yes | URL https://huggingface.co/ datasets/cerebras/Slim Pajama-627B. |
| Dataset Splits | No | Finally, divide this collection of instances into training and test sets. We perform exhaustive search over the grid of hyper-parameters in Table 3 and pick the best setting best on validation set on ICLL and AR seperately. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) were provided for the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') are explicitly mentioned in the paper's main text or appendices. |
| Experiment Setup | Yes | Table 3. Hyper-parameter search space for neural models. hidden size [64, 128, 256, 512, 1024] number of layers [1, 2, 4, 8, 12] number of heads [1, 2, 4] epochs [200, 400] batch size 32 optimizer [Adam W] learning rate [1e-4, 2.5e-4 ] weight decay [0.01, 0.1] βs [(0.9, 0.99)] scheduler Cosine Scheduler with Warmup minimum learning rate 2.5e-5 warm-up start learning rate 1e-7 warm-up steps 25000. We perform exhaustive search over the grid of hyper-parameters in Table 3 and pick the best setting best on validation set on ICLL and AR seperately. |