MEND: Meta Demonstration Distillation for Efficient and Effective In-Context Learning

Authors: Yichuan Li, Xiyao Ma, Sixing Lu, Kyumin Lee, Xiaohu Liu, Chenlei Guo

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations across seven diverse ICL task partitions using decoder-only (GPT-2) and encoder-decoder (T5) attest to MEND s prowess.
Researcher Affiliation Collaboration 1Worcester Polytechnic Institute, 2Amazon Alexa AI {yli29,kmlee}@wpi.edu {maxiya,cynthilu,derecliu,guochenl}@amazon.com
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps formatted like code.
Open Source Code Yes 1The code is avaliable at https://github.com/bigheiniu/MEND.
Open Datasets Yes To validate our methodology, we employ the Meta ICL dataset introduced by Min et al. (2022a), designed for in-context learning scenarios. Meta ICL builds upon existing few-shot datasets, such as Cross Fit (Ye et al., 2021) and Unified QA (Khashabi et al., 2020).
Dataset Splits Yes Notably, the Meta ICL dataset is divided into two distinct partitions: meta-train and meta-test, with no overlap between them. This setting expect the model first trained on meta-train then evaluated on meta-test dataset.
Hardware Specification Yes All experiments were conducted on eight A10 NVIDIA GPUs, each equipped with 24GB of memory.
Software Dependencies Yes We implemented our proposed methodology using Py Torch v1.13.1 (Paszke et al., 2019), complemented by the Hugging Face Transformers library v4.24.0 (Wolf et al., 2019) and Accelerate v0.20.0 (Gugger et al., 2022).
Experiment Setup Yes The complete set of stable hyperparameters for training runs can be found in Tab. 5. These parameters are adapted from Meta ICL (Min et al., 2022a). Additional hyperparameters that needed exploration and their corresponding search spaces are also detailed in Tab. 5.