Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PLD: A Choice-Theoretic List-Wise Knowledge Distillation

Authors: Ejafa Bassam, Dawei Zhu, Kaigui Bian

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, across CIFAR-100, Image Net-1K, and MS-COCO, PLD achieves consistent gains across diverse architectures and distillation objectives, including divergence-based, correlation-based, and feature-based methods, in both homogeneous and heterogeneous teacher student pairs. The paper also includes a dedicated '5 Experiments' section with detailed results, tables, and figures demonstrating empirical evaluation.
Researcher Affiliation	Academia	Ejafa Bassam Dawei Zhu Kaigui Bian School of Computer Science, Peking University Email: EMAIL Email: EMAIL Corresponding author; Email: EMAIL
Pseudocode	Yes	The paper includes a dedicated 'C Implementation of the PLD Loss' section which presents a Python code block for the PLD loss function.
Open Source Code	Yes	In the 'Neur IPS Paper Checklist', Question 5 'Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?' is answered with '[Yes]'.
Open Datasets	Yes	We evaluate PLD on three representative visual recognition datasets: CIFAR-100 [15], Image Net-1K [4], and MS-COCO [20].
Dataset Splits	Yes	The dataset contains 50,000 training images and 10,000 validation images across 100 categories at 32 x 32 resolution. (for CIFAR-100) For validation we adopt the 'A-recipe' from [44]. Specifically, we set the test resolution r = 224 and test crop ratio ρ = 0.95, then resizemin = r/ρ ≈ 236, apply a bicubic resize of the shorter side to resizemin, followed by a center crop of size r x r. (for ImageNet-1K)
Hardware Specification	Yes	We train for 100 epochs with an effective batch size of 2048 images (256 per GPU across eight NVIDIA A100 SXM4 80 GB accelerators) using the LAMB[49] optimizer.
Software Dependencies	No	The paper mentions software like PyTorch (implied by import statements in Section C), refers to the Timm library [43], and discusses optimizers such as LAMB [49], Adam W [21], Adan [46], and Ada Belief [56]. However, specific version numbers for these software components or frameworks are not provided, which is necessary for reproducibility.
Experiment Setup	Yes	All models are trained from scratch for 250 epochs. We use the Adam W optimizer with β1=0.9 and β2=0.999. The learning rate follows a cosine-annealing schedule, starting at 0.001. We set weight decay to 0.5 and the batch size to 128. We apply standard data augmentations: random cropping, horizontal flipping, and per-channel normalization. (for CIFAR-100) We train for 100 epochs with an effective batch size of 2048 images (...) using the LAMB[49] optimizer, an initial learning rate of 5 x 10^-3 decayed by a cosine schedule and linearly warmed up over the first 5 epochs, and weight decay fixed at 0.02. (for ImageNet-1K)