Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation
Authors: Yu-Liang Zhan, Zhong-Yi Lu, Hao Sun, Ze-Feng Gao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments validate the significant performance enhancement by our approach in various KD tasks, covering computer vision and natural language processing areas. Our code is available at https://github.com/intell-sci-comput/OPDF. |
| Researcher Affiliation | Academia | Yu-Liang Zhan Gaoling School of Artificial Intelligence Renmin University of China zhanyuliang@ruc.edu.cn Zhong-Yi Lu School of Physics Renmin University of China zlu@ruc.edu.cn Hao Sun Gaoling School of Artificial Intelligence Renmin University of China haosun@ruc.edu.cn Ze-Feng Gao School of Physics Renmin University of China zfgao@ruc.edu.cn |
| Pseudocode | Yes | C Algorithms The over-parameterized distillation framework algorithm is shown in Algorithm S.1. Algorithm S.1 Over-parameterized distillation framework. ... The MPO pseudocode is shown in Algorithm S.2. Algorithm S.2 MPO decomposition for a matrix. |
| Open Source Code | Yes | Our code is available at https://github.com/intell-sci-comput/OPDF. |
| Open Datasets | Yes | Datasets and Metrics For NLP tasks, we evaluate our approach on text classification tasks in GLUE benchmark [57]. ... In the context of CV tasks, we have applied the OPDF to the distillation of Vision Transformers (Vi T) for image classification [7]. This was done using the Image Net-21k dataset [58], Image Net-1k, Image Net Real [59], and Image Net V2 [60] datasets. |
| Dataset Splits | Yes | We report the performance of the model that achieves the best results on the validation set when applied to the test set. |
| Hardware Specification | Yes | In NLP tasks, our method takes half to two GPU hours on A100 GPU. ... In CV tasks, our method takes 160.0 GPU days on A100 GPUs to pretrain Tiny Vi T-21M. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | The detailed hyperparameter settings for these NLP distillation models are provided in Table S.3and S.4. ... The specific experimental parameters utilized are detailed in Table S.5. |