Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Mixture of Demonstrations for In-Context Learning
Authors: Song Wang, Zihan Chen, Chengshuai Shi, Cong Shen, Jundong Li
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate Mo D via experiments across a range of NLP datasets and tasks, demonstrating its state-of-the-art performance and shedding new light on the future design of retrieval methods for ICL. |
| Researcher Affiliation | Academia | Song Wang University of Virginia EMAIL Zihan Chen University of Virginia EMAIL Chengshuai Shi University of Virginia EMAIL Cong Shen University of Virginia EMAIL Jundong Li University of Virginia EMAIL |
| Pseudocode | Yes | We outline the training process in Algorithm 1, with each phase introduced in the following sections. |
| Open Source Code | Yes | We provide the code at https://github.com/SongW-SW/MoD. |
| Open Datasets | Yes | Table 1: The datasets used in experiments and their corresponding tasks. # Train and # Validation denote the numbers of samples during training and validation, respectively. # Demo denotes the average number of demonstrations used in each task during validation. # Expert represents the number of experts used in each task. |
| Dataset Splits | Yes | Table 1: The datasets used in experiments and their corresponding tasks. # Train and # Validation denote the numbers of samples during training and validation, respectively. # Demo denotes the average number of demonstrations used in each task during validation. # Expert represents the number of experts used in each task. |
| Hardware Specification | Yes | We conduct experiments on two NVIDIA A100 GPUs, each with 80GB of memory. |
| Software Dependencies | No | The paper mentions software like Sentence-BERT, bert-base-uncased model, and Huggingface Transformers, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | To keep consistency with CEIL [57] and EPR [34], we primarily use GPT-Neo [4], a 2.7-billion-parameter language model trained on The Pile [10]... The number of in-context demonstrations in our experiments is set as 50... Regarding the experiments in this work, we use a batch size of 128 and a learning rate of 10 5. We set the size of the candidate demonstration set as K = 50. The size of the positive demonstration set is e K = 10. |