Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity

Authors: Seonghoon Yu, Dongjun Nam, Dina Katabi, Jeany Son

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on various KD benchmark datasets: 1) CIFAR100 [24], a 100-class image classification dataset containing 50,000 training and 10,000 validation images of size 32 32, 2) Image Net [10]: a large-scale classification dataset with 1,000 categories, approximately 1.28 million training and 50,000 validation images, each of size 224 224, and 3) Imbalanced CIFAR-100, following prior works [50, 20], where 43 classes out of 100 CIFAR classes are selected, and each class is limited to 50 training samples. The imbalanced classes can be found in our supplementary. 4) STL-10 [8], a 10-class image classification dataset with an image size of 96 96, comprising 5,000 training images and 8,000 test images. 5) Tiny Image Net [10], a 200-class subset of Image Net, with each class containing 500 training images, 50 validation images, and 50 test images with a size of 64 64. For the evaluation metric, we use top-1 classification accuracy (%).
Researcher Affiliation	Academia	1GIST 2POSTECH 3MIT CSAIL EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes the proposed methods and their components in detail within Section 2, such as '2.1 Single-teacher View Augmentation Heads' and '2.2 Angular Losses for Learning Diverse Representations'. However, it does so through descriptive text rather than presenting structured pseudocode or algorithm blocks.
Open Source Code	Yes	https://github.com/june6423/Angular-KD (URL provided at the beginning of the paper). Additionally, the NeurIPS checklist states: 'Codes are included in the supplementary material Zip file. We will release our code on Git Hub in the future.'
Open Datasets	Yes	We conduct experiments on various KD benchmark datasets: 1) CIFAR100 [24], 2) Image Net [10]: a large-scale classification dataset with 1,000 categories, [...] 3) Imbalanced CIFAR-100, following prior works [50, 20], [...] 4) STL-10 [8], [...] 5) Tiny Image Net [10], [...] and 6) Carvana Image Masking dataset [44]
Dataset Splits	Yes	1) CIFAR100 [24], a 100-class image classification dataset containing 50,000 training and 10,000 validation images of size 32 32, 2) Image Net [10]: a large-scale classification dataset with 1,000 categories, approximately 1.28 million training and 50,000 validation images, each of size 224 224, and 4) STL-10 [8], a 10-class image classification dataset with an image size of 96 96, comprising 5,000 training images and 8,000 test images. [...] 6) Carvana Image Masking dataset [44], which includes 5,088 training images and 100,064 test images. [...] To construct the imbalanced CIFAR-100 dataset, we employ the same class selection protocol as Te KAP [20]. In detail, we select 43 classes as imbalanced classes among a total of 100 classes. For each of these classes, we choose the first 50 training samples in sequence (without shuffling) for reproducibility and discard the rest. The remaining 57 classes keep their full set of 500 training images. [...] To create the limited-data scenarios, we randomly sample 25%, 50%, and 75% of CIFAR-100 training data. For reproducibility, we select the first 25%, 50%, and 75% samples in sequence (without shuffling).
Hardware Specification	Yes	The student model is trained for 240 epochs on a single RTX 2080 Ti GPU using SGD optimizer, and a batch size of 64.
Software Dependencies	No	The paper mentions following the 'standard Py Torch training schedule' for ImageNet experiments but does not specify version numbers for PyTorch or any other software libraries used. For example, it does not state 'PyTorch 1.x' or 'Python 3.x'.
Experiment Setup	Yes	Implementation Details. For view augmentation heads (Sec. 2.1), we generate N = 5 augmented views, apply dropout with probabilities {0.2, 0.25,0.3, 0.35, 0.4}, and use a softmax temperature of τ Z = 4. For constrained inter-angle diversity loss (Sec. 2.2), the learnable margin γ is initialized to 0.2, and the contrastive temperature is set to τ C = 0.07. For the ensembling (Sec. 2.3), we use uniform weights across all ensemble members. Training starts with a 30-epoch warm-up phase where only the view augmentation heads are trained to ensure stability. Subsequently, the student model is trained for 240 epochs on a single RTX 2080 Ti GPU using SGD optimizer, and a batch size of 64. The learning rate starts at 0.01 and is decayed by a factor of 10 at epochs 150, 180, and 210.