Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DeepKD: A Deeply Decoupled and Denoised Knowledge Distillation Trainer
Authors: Haiduo Huang, Jiangcheng Song, Yadong Zhang, Pengju Ren
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CIFAR-100, Image Net, and MS-COCO demonstrate Deep KD s effectiveness. |
| Researcher Affiliation | Academia | Haiduo Huang , Jiangcheng Song , Yadong Zhang , Pengju Ren Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University EMAIL, EMAIL |
| Pseudocode | Yes | In this section, we provide the pseudo code for our proposed Deep KD framework, which includes the main algorithm (Algorithm 1) and the Dynamic Top-K Masking (DTM) strategy (Algorithm 2). |
| Open Source Code | Yes | This paper presents comprehensive experimental configurations and implementation details. The open-source code and trained checkpoints will be made available to facilitate reproducibility. |
| Open Datasets | Yes | We conduct comprehensive evaluations on three widely-used benchmarks: CIFAR-100 [50] (100 classes, 50k training/10k validation 32 32 images), Image Net-1K [51] (1,000 classes, 1.28M/50k images cropped to 224 224), and MS-COCO [52] (80-class detection, 118k training/5k validation images). |
| Dataset Splits | Yes | We conduct comprehensive evaluations on three widely-used benchmarks: CIFAR-100 [50] (100 classes, 50k training/10k validation 32 32 images), Image Net-1K [51] (1,000 classes, 1.28M/50k images cropped to 224 224), and MS-COCO [52] (80-class detection, 118k training/5k validation images). |
| Hardware Specification | Yes | All experiments were conducted on a system equipped with an Nvidia RTX 4090 GPU and an AMD 64-Core Processor CPU. All experiments use a single 2080Ti GPU for CIFAR-100 and two RTX 4090 GPUs for training on Image Net-1K. |
| Software Dependencies | No | For implementation, we follow standard practices using SGD optimizer with momentum 0.9 and weight decay of 5 10 4 (CIFAR) or 1 10 4 (Image Net). Algorithm 1 Pseudo code of Deep KD Gradient Decoupling in a Py Torch-like style. No specific version numbers for software libraries or environments are provided. |
| Experiment Setup | Yes | For implementation, we follow standard practices using SGD optimizer with momentum 0.9 and weight decay of 5 10 4 (CIFAR) or 1 10 4 (Image Net). The training schedule varies by dataset: CIFAR uses 240 epochs with batch size 64 and initial learning rate 0.01-0.05, while Image Net uses 100 epochs with batch size 512 and learning rate 0.2. |