Mixture of Demonstrations for In-Context Learning
Authors: Song Wang, Zihan Chen, Chengshuai Shi, Cong Shen, Jundong Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate Mo D via experiments across a range of NLP datasets and tasks, demonstrating its state-of-the-art performance and shedding new light on the future design of retrieval methods for ICL. |
| Researcher Affiliation | Academia | Song Wang University of Virginia sw3wv@virginia.edu Zihan Chen University of Virginia brf3rx@virginia.edu Chengshuai Shi University of Virginia cs7ync@virginia.edu Cong Shen University of Virginia cong@virginia.edu Jundong Li University of Virginia jundong@virginia.edu |
| Pseudocode | Yes | We outline the training process in Algorithm 1, with each phase introduced in the following sections. |
| Open Source Code | Yes | We provide the code at https://github.com/SongW-SW/MoD. |
| Open Datasets | Yes | Table 1: The datasets used in experiments and their corresponding tasks. # Train and # Validation denote the numbers of samples during training and validation, respectively. # Demo denotes the average number of demonstrations used in each task during validation. # Expert represents the number of experts used in each task. |
| Dataset Splits | Yes | Table 1: The datasets used in experiments and their corresponding tasks. # Train and # Validation denote the numbers of samples during training and validation, respectively. # Demo denotes the average number of demonstrations used in each task during validation. # Expert represents the number of experts used in each task. |
| Hardware Specification | Yes | We conduct experiments on two NVIDIA A100 GPUs, each with 80GB of memory. |
| Software Dependencies | No | The paper mentions software like Sentence-BERT, bert-base-uncased model, and Huggingface Transformers, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | To keep consistency with CEIL [57] and EPR [34], we primarily use GPT-Neo [4], a 2.7-billion-parameter language model trained on The Pile [10]... The number of in-context demonstrations in our experiments is set as 50... Regarding the experiments in this work, we use a batch size of 128 and a learning rate of 10 5. We set the size of the candidate demonstration set as K = 50. The size of the positive demonstration set is e K = 10. |