Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation

Authors: Yu-Liang Zhan, Zhong-Yi Lu, Hao Sun, Ze-Feng Gao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments validate the significant performance enhancement by our approach in various KD tasks, covering computer vision and natural language processing areas. Our code is available at https://github.com/intell-sci-comput/OPDF.
Researcher Affiliation Academia Yu-Liang Zhan Gaoling School of Artificial Intelligence Renmin University of China zhanyuliang@ruc.edu.cn Zhong-Yi Lu School of Physics Renmin University of China zlu@ruc.edu.cn Hao Sun Gaoling School of Artificial Intelligence Renmin University of China haosun@ruc.edu.cn Ze-Feng Gao School of Physics Renmin University of China zfgao@ruc.edu.cn
Pseudocode Yes C Algorithms The over-parameterized distillation framework algorithm is shown in Algorithm S.1. Algorithm S.1 Over-parameterized distillation framework. ... The MPO pseudocode is shown in Algorithm S.2. Algorithm S.2 MPO decomposition for a matrix.
Open Source Code Yes Our code is available at https://github.com/intell-sci-comput/OPDF.
Open Datasets Yes Datasets and Metrics For NLP tasks, we evaluate our approach on text classification tasks in GLUE benchmark [57]. ... In the context of CV tasks, we have applied the OPDF to the distillation of Vision Transformers (Vi T) for image classification [7]. This was done using the Image Net-21k dataset [58], Image Net-1k, Image Net Real [59], and Image Net V2 [60] datasets.
Dataset Splits Yes We report the performance of the model that achieves the best results on the validation set when applied to the test set.
Hardware Specification Yes In NLP tasks, our method takes half to two GPU hours on A100 GPU. ... In CV tasks, our method takes 160.0 GPU days on A100 GPUs to pretrain Tiny Vi T-21M.
Software Dependencies No The paper does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes The detailed hyperparameter settings for these NLP distillation models are provided in Table S.3and S.4. ... The specific experimental parameters utilized are detailed in Table S.5.