Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack

Authors: Yukun Chen, Boheng Li, Yu Yuan, Leyi Qi, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across diverse datasets, model architectures, and KD techniques validate the effectiveness of our SCAR and its resistance against existing backdoor detection, highlighting a significant yet previously overlooked vulnerability in the KD process.
Researcher Affiliation	Academia	1State Key Laboratory of Blockchain and Data Security, Zhejiang University 2Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3Nanyang Technological University EMAIL; EMAIL; EMAIL; EMAIL; EMAIL
Pseudocode	Yes	Algorithm 1 SCAR Training Process
Open Source Code	Yes	Our code is available at https://github.com/Whitolf Chen/SCAR.
Open Datasets	Yes	We conduct experiments on two classical benchmark datasets, including CIFAR-10 [40] and (a subset of) Image Net [14] containing 50 classes. In our experiments, we use only open-source datasets, namely CIFAR-10 [40], Image Net [14] and CINIC-10 [12], for evaluation.
Dataset Splits	Yes	The CIFAR-10 dataset [40] contains 50,000 training samples and 10,000 testing samples in total. The dataset has 10 classes and each class has 5,000 training samples and 1,000 testing samples. In this paper, we select a subset with 50 different classes and each class contains 500 training samples and 100 testing samples with size 3 224 224.
Hardware Specification	Yes	All our experiments are implemented with RTX 4090 GPUs.
Software Dependencies	No	In our implementations, we utilize Py Torch as the deep learning framework.
Experiment Setup	Yes	We utilize the Adam as the optimizer and the batch size is set to 256 on CIFAR-10 and 128 on Image Net. We set the initial learning rate as 10 4 and train all models for 200 epochs, with the learning rate reduced by a cosine annealing schedule. The outer optimization runs for 200 iterations, with 20 inner-loop updates per outer iteration, and each fixed-point iteration runs for 100 steps. The teacher model is optimized using Adam with an initial learning rate of 10 4, and the learning rate is decayed using a cosine annealing schedule. The hyperparameters, α, β, γ, and δ, are set to 1.