Mixture of Demonstrations for In-Context Learning

Authors: Song Wang, Zihan Chen, Chengshuai Shi, Cong Shen, Jundong Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate Mo D via experiments across a range of NLP datasets and tasks, demonstrating its state-of-the-art performance and shedding new light on the future design of retrieval methods for ICL.
Researcher Affiliation Academia Song Wang University of Virginia sw3wv@virginia.edu Zihan Chen University of Virginia brf3rx@virginia.edu Chengshuai Shi University of Virginia cs7ync@virginia.edu Cong Shen University of Virginia cong@virginia.edu Jundong Li University of Virginia jundong@virginia.edu
Pseudocode Yes We outline the training process in Algorithm 1, with each phase introduced in the following sections.
Open Source Code Yes We provide the code at https://github.com/SongW-SW/MoD.
Open Datasets Yes Table 1: The datasets used in experiments and their corresponding tasks. # Train and # Validation denote the numbers of samples during training and validation, respectively. # Demo denotes the average number of demonstrations used in each task during validation. # Expert represents the number of experts used in each task.
Dataset Splits Yes Table 1: The datasets used in experiments and their corresponding tasks. # Train and # Validation denote the numbers of samples during training and validation, respectively. # Demo denotes the average number of demonstrations used in each task during validation. # Expert represents the number of experts used in each task.
Hardware Specification Yes We conduct experiments on two NVIDIA A100 GPUs, each with 80GB of memory.
Software Dependencies No The paper mentions software like Sentence-BERT, bert-base-uncased model, and Huggingface Transformers, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes To keep consistency with CEIL [57] and EPR [34], we primarily use GPT-Neo [4], a 2.7-billion-parameter language model trained on The Pile [10]... The number of in-context demonstrations in our experiments is set as 50... Regarding the experiments in this work, we use a batch size of 128 and a learning rate of 10 5. We set the size of the candidate demonstration set as K = 50. The size of the positive demonstration set is e K = 10.