Adversarial Moment-Matching Distillation of Large Language Models
Authors: Chen Jia
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results from both task-agnostic instruction-following experiments and task-specific experiments demonstrate the effectiveness of our method and achieve new state-of-the-art performance. Empirically, we evaluate our approach on both the instruction-following dataset and three task-specific datasets for text summarization, machine translation, and commonsense reasoning. |
| Researcher Affiliation | Industry | Chen Jia SI-TECH Information Technology jiachenwestlake@gmail.com |
| Pseudocode | Yes | Algorithm 1: Adversarial training procedure (Page 5) |
| Open Source Code | Yes | The code and implementation are released at https://github.com/jiachenwestlake/MMKD. |
| Open Datasets | Yes | We construct the training data from databricks-dolly-15k [8], where we randomly select 15K samples for training and equally split 500 samples for validation and testing. we also add the Open Web Text [13] corpus. For the text summarization task, we follow Ko et al. [21] to conduct experiments on the SAMSum [12] dataset. For the machine translation tasks, we follow Ko et al. [21] to conduct experiments on the IWSLT 17 (en-de) [5] dataset. For the commonsense reasoning task, we conduct experiments on the Strategy QA dataset [11]. |
| Dataset Splits | Yes | We construct the training data from databricks-dolly-15k [8], where we randomly select 15K samples for training and equally split 500 samples for validation and testing. |
| Hardware Specification | Yes | We use NVIDIA A40 GPUs with 40GB RAM to conduct all the experiments. (Appendix B.1) |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, etc.). |
| Experiment Setup | Yes | More details on experimental setup refer to Appendix B. More details about the experimental setup refer to Appendix B. (Tables 3 and 4 in Appendix B list detailed hyperparameters such as Max. Step Size, Inner Step Size, Batch Size, Learning Rate, etc.) |