reproducibilityindex.ai

Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation

Authors: Yu-Liang Zhan, Zhong-Yi Lu, Hao Sun, Ze-Feng Gao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments validate the significant performance enhancement by our approach in various KD tasks, covering computer vision and natural language processing areas. Our code is available at https://github.com/intell-sci-comput/OPDF.
Researcher Affiliation	Academia	Yu-Liang Zhan Gaoling School of Artificial Intelligence Renmin University of China zhanyuliang@ruc.edu.cn Zhong-Yi Lu School of Physics Renmin University of China zlu@ruc.edu.cn Hao Sun Gaoling School of Artificial Intelligence Renmin University of China haosun@ruc.edu.cn Ze-Feng Gao School of Physics Renmin University of China zfgao@ruc.edu.cn
Pseudocode	Yes	C Algorithms The over-parameterized distillation framework algorithm is shown in Algorithm S.1. Algorithm S.1 Over-parameterized distillation framework. ... The MPO pseudocode is shown in Algorithm S.2. Algorithm S.2 MPO decomposition for a matrix.
Open Source Code	Yes	Our code is available at https://github.com/intell-sci-comput/OPDF.
Open Datasets	Yes	Datasets and Metrics For NLP tasks, we evaluate our approach on text classification tasks in GLUE benchmark [57]. ... In the context of CV tasks, we have applied the OPDF to the distillation of Vision Transformers (Vi T) for image classification [7]. This was done using the Image Net-21k dataset [58], Image Net-1k, Image Net Real [59], and Image Net V2 [60] datasets.
Dataset Splits	Yes	We report the performance of the model that achieves the best results on the validation set when applied to the test set.
Hardware Specification	Yes	In NLP tasks, our method takes half to two GPU hours on A100 GPU. ... In CV tasks, our method takes 160.0 GPU days on A100 GPUs to pretrain Tiny Vi T-21M.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	The detailed hyperparameter settings for these NLP distillation models are provided in Table S.3and S.4. ... The specific experimental parameters utilized are detailed in Table S.5.