Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Discovering and Overcoming Limitations of Noise-engineered Data-free Knowledge Distillation
Authors: Piyush Raikwar, Deepak Mishra
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach on CIFAR10, CIFAR100, SVHN, and Food101 datasets. |
| Researcher Affiliation | Academia | Piyush Raikwar ABV-IIITM, Gwalior, India EMAIL Deepak Mishra IIT Jodhpur, India EMAIL |
| Pseudocode | Yes | Algorithm 1 Training KD and Algorithm 2 Evaluation |
| Open Source Code | Yes | Code is available at: https://github.com/Piyush-555/Gaussian Distillation |
| Open Datasets | Yes | We validate our approach on CIFAR10, CIFAR100, SVHN, and Food101 datasets. |
| Dataset Splits | No | The paper mentions "validation set" once in passing, and refers to "test data" and "training data" subsets for finetuning, but does not provide specific train/validation/test split percentages, sample counts, or explicit predefined split citations for reproducibility of data partitioning. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory amounts, or detailed computer specifications) are provided. |
| Software Dependencies | No | The paper mentions using "Adam optimizer" but does not provide specific software dependencies like framework versions (e.g., PyTorch 1.9) or other library versions needed for replication. |
| Experiment Setup | Yes | In both cases, the batch size is 256, and an Adam optimizer with a learning rate of 10^-3 for tuning the parameters of the student network is used. For finetuning, a subset of training data is sampled randomly and a reduced learning rate of 10^-4 is used. |