In-Context Learning State Vector with Inner and Momentum Optimization
Authors: Dongfang Li, zhenyu liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments using Llama-2 and GPT-J in both zero-shot setting and few-shot setting. The experimental results show that our optimization method effectively enhances the state vector and achieves the state-of-the-art performance on diverse tasks. |
| Researcher Affiliation | Academia | Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu B, Min Zhang Harbin Institute of Technology (Shenzhen), Shenzhen, China {crazyofapple, liuzhenyuhit}@gmail.com {hubaotian, zhangmin2021}@hit.edu.cn |
| Pseudocode | No | The paper describes algorithms and methods but does not include any explicitly labeled pseudocode blocks or algorithm figures. |
| Open Source Code | Yes | Code is available at https: //github.com/HITsz-TMG/ICL-State-Vector |
| Open Datasets | Yes | Linguistics includes Antonym [Nguyen et al., 2017], Capitalize, Present-Past, and Singular Plural [Todd et al., 2023], focusing on transformations in the form or meaning of words. Translation is represented by the English-French [Lample et al., 2018] dataset, which involves translating English words into their French counterparts. Knowledge comprises Country-Capital [Todd et al., 2023], AG News [Zhang et al., 2015], Person-Sport, Person-Instrument, Person-Occupation, Product-Company, and Landmark Country [Hernandez et al., 2023], which are centred around question-to-answer mappings for commonsense knowledge queries. |
| Dataset Splits | Yes | The remaining instances are split into test and development sets with a 7:3 ratio. |
| Hardware Specification | Yes | We run all the experiments on a single NVIDIA A100 80G GPUs. |
| Software Dependencies | No | The paper mentions using Llama-2 and GPT-J models but does not specify software versions for libraries like PyTorch, TensorFlow, or Python itself, which would be necessary for reproduction. |
| Experiment Setup | Yes | Each subset consists of 10 instances for demonstrations and one instance for a dummy query since we employ a 10-shot as the default ICL setting. ... We find the best layer for different tasks via the accuracy of the development set. For the inner optimization in 4.2, we choose the last seven state vectors to optimize. ... For the momentum optimization, we choose 0.5 as the retention rate for historical momentum from the options of 0.25, 0.5 and 0.75. |