reproducibilityindex.ai

MiniLLM: Knowledge Distillation of Large Language Models

Authors: Yuxian Gu, Li Dong, Furu Wei, Minlie Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments in the instruction-following setting show that MINILLM generates more precise responses with higher overall quality, lower exposure bias, better calibration, and higher long-text generation performance than the baselines.
Researcher Affiliation	Collaboration	Yuxian Gu1,2 , Li Dong2, Furu Wei2, Minlie Huang1 1The Co AI Group, Tsinghua University 2Microsoft Research
Pseudocode	Yes	Algorithm 1 MINILLM: Knowledge Distillation of LLMs
Open Source Code	Yes	Our code, data, and model checkpoints can be found in https://github.com/microsoft/LMOps/tree/main/minillm.
Open Datasets	Yes	We construct the training data from databricks-dolly-15K3 consisting of 15K human-written instruction-response pairs. (...) 3https://github.com/databrickslabs/dolly/tree/master
Dataset Splits	Yes	Then, we randomly split 0.5K and 1K samples for validation and testing, respectively, leaving about 12.5K examples for training.
Hardware Specification	Yes	Our experiments are based on the NVIDIA V100 32G GPUs.
Software Dependencies	No	The paper does not specify version numbers for key software components such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	Phase 2: We continuously train the model from Phase 1 as described in Algorithm B using a learning rate 5e-6, a mini-batch size 64 in all cases. The clipping rate ϵ is set to 0.2, and the max length of the model is 512. We use temperature = 1 when sampling from qθ.