Breaking through the learning plateaus of in-context learning in Transformer
Authors: Jingwen Fu, Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By conducting meticulous and controlled experiments on synthetic tasks, we note that the persistence of learning plateaus correlates with compromised functionality of the weights component. The effectiveness of these strategies is further confirmed in natural language processing tasks. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University 2Tsinghua University 3Microsoft Research Asia. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions leveraging 'opensouce of the Pythia 13B checkpoints' which refers to a third-party resource. It does not provide any statement or link indicating that the authors' own code for the described methodology is publicly available. |
| Open Datasets | Yes | We propose a task using the Shapes3D (Kim & Mnih, 2018) dataset for a more controllable study. The dataset is contructed based on SST (Socher et al., 2013) datasets. |
| Dataset Splits | No | The paper specifies a 'training image set (80 %)' and 'test image set (20 %)' for Shape3D but does not explicitly provide percentages or counts for a distinct validation set split of the dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running its experiments. It mentions 'computational resources' generally but no specific hardware. |
| Software Dependencies | No | The paper mentions software components like 'VAE', 'Adam optimizer', and 'GPT2 model' but does not specify their version numbers or any other software dependencies with version information required for replication. |
| Experiment Setup | Yes | We utilize a batch size of 128 and set the learning rate to 0.0001. For SST-ICL task and Word Selection task, the models are both trained using Adam W optimizer with learning rate 2e 5. We choose the batch size as 64. |