In-Context Learning State Vector with Inner and Momentum Optimization

Authors: Dongfang Li, zhenyu liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments using Llama-2 and GPT-J in both zero-shot setting and few-shot setting. The experimental results show that our optimization method effectively enhances the state vector and achieves the state-of-the-art performance on diverse tasks.
Researcher Affiliation Academia Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu B, Min Zhang Harbin Institute of Technology (Shenzhen), Shenzhen, China {crazyofapple, liuzhenyuhit}@gmail.com {hubaotian, zhangmin2021}@hit.edu.cn
Pseudocode No The paper describes algorithms and methods but does not include any explicitly labeled pseudocode blocks or algorithm figures.
Open Source Code Yes Code is available at https: //github.com/HITsz-TMG/ICL-State-Vector
Open Datasets Yes Linguistics includes Antonym [Nguyen et al., 2017], Capitalize, Present-Past, and Singular Plural [Todd et al., 2023], focusing on transformations in the form or meaning of words. Translation is represented by the English-French [Lample et al., 2018] dataset, which involves translating English words into their French counterparts. Knowledge comprises Country-Capital [Todd et al., 2023], AG News [Zhang et al., 2015], Person-Sport, Person-Instrument, Person-Occupation, Product-Company, and Landmark Country [Hernandez et al., 2023], which are centred around question-to-answer mappings for commonsense knowledge queries.
Dataset Splits Yes The remaining instances are split into test and development sets with a 7:3 ratio.
Hardware Specification Yes We run all the experiments on a single NVIDIA A100 80G GPUs.
Software Dependencies No The paper mentions using Llama-2 and GPT-J models but does not specify software versions for libraries like PyTorch, TensorFlow, or Python itself, which would be necessary for reproduction.
Experiment Setup Yes Each subset consists of 10 instances for demonstrations and one instance for a dummy query since we employ a 10-shot as the default ICL setting. ... We find the best layer for different tasks via the accuracy of the development set. For the inner optimization in 4.2, we choose the last seven state vectors to optimize. ... For the momentum optimization, we choose 0.5 as the retention rate for historical momentum from the options of 0.25, 0.5 and 0.75.