Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation
Authors: Yu-Liang Zhan, Zhong-Yi Lu, Hao Sun, Ze-Feng Gao
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments validate the significant performance enhancement by our approach in various KD tasks, covering computer vision and natural language processing areas. Our code is available at https://github.com/intell-sci-comput/OPDF. |
| Researcher Affiliation | Academia | Yu-Liang Zhan Gaoling School of Artificial Intelligence Renmin University of China EMAIL Zhong-Yi Lu School of Physics Renmin University of China EMAIL Hao Sun Gaoling School of Artificial Intelligence Renmin University of China EMAIL Ze-Feng Gao School of Physics Renmin University of China EMAIL |
| Pseudocode | Yes | C Algorithms The over-parameterized distillation framework algorithm is shown in Algorithm S.1. Algorithm S.1 Over-parameterized distillation framework. ... The MPO pseudocode is shown in Algorithm S.2. Algorithm S.2 MPO decomposition for a matrix. |
| Open Source Code | Yes | Our code is available at https://github.com/intell-sci-comput/OPDF. |
| Open Datasets | Yes | Datasets and Metrics For NLP tasks, we evaluate our approach on text classification tasks in GLUE benchmark [57]. ... In the context of CV tasks, we have applied the OPDF to the distillation of Vision Transformers (Vi T) for image classification [7]. This was done using the Image Net-21k dataset [58], Image Net-1k, Image Net Real [59], and Image Net V2 [60] datasets. |
| Dataset Splits | Yes | We report the performance of the model that achieves the best results on the validation set when applied to the test set. |
| Hardware Specification | Yes | In NLP tasks, our method takes half to two GPU hours on A100 GPU. ... In CV tasks, our method takes 160.0 GPU days on A100 GPUs to pretrain Tiny Vi T-21M. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | The detailed hyperparameter settings for these NLP distillation models are provided in Table S.3and S.4. ... The specific experimental parameters utilized are detailed in Table S.5. |