reproducibilityindex.ai

Mixture of Demonstrations for In-Context Learning

Authors: Song Wang, Zihan Chen, Chengshuai Shi, Cong Shen, Jundong Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate Mo D via experiments across a range of NLP datasets and tasks, demonstrating its state-of-the-art performance and shedding new light on the future design of retrieval methods for ICL.
Researcher Affiliation	Academia	Song Wang University of Virginia sw3wv@virginia.edu Zihan Chen University of Virginia brf3rx@virginia.edu Chengshuai Shi University of Virginia cs7ync@virginia.edu Cong Shen University of Virginia cong@virginia.edu Jundong Li University of Virginia jundong@virginia.edu
Pseudocode	Yes	We outline the training process in Algorithm 1, with each phase introduced in the following sections.
Open Source Code	Yes	We provide the code at https://github.com/SongW-SW/MoD.
Open Datasets	Yes	Table 1: The datasets used in experiments and their corresponding tasks. # Train and # Validation denote the numbers of samples during training and validation, respectively. # Demo denotes the average number of demonstrations used in each task during validation. # Expert represents the number of experts used in each task.
Dataset Splits	Yes	Table 1: The datasets used in experiments and their corresponding tasks. # Train and # Validation denote the numbers of samples during training and validation, respectively. # Demo denotes the average number of demonstrations used in each task during validation. # Expert represents the number of experts used in each task.
Hardware Specification	Yes	We conduct experiments on two NVIDIA A100 GPUs, each with 80GB of memory.
Software Dependencies	No	The paper mentions software like Sentence-BERT, bert-base-uncased model, and Huggingface Transformers, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	To keep consistency with CEIL [57] and EPR [34], we primarily use GPT-Neo [4], a 2.7-billion-parameter language model trained on The Pile [10]... The number of in-context demonstrations in our experiments is set as 50... Regarding the experiments in this work, we use a batch size of 128 and a learning rate of 10 5. We set the size of the candidate demonstration set as K = 50. The size of the positive demonstration set is e K = 10.