Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Undistillable Models by Minimizing Conditional Mutual Information

Authors: Linfeng Ye, Shayan Mohajer Hamidi, EN-HUI YANG

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The resulting CMIM model is shown, by extensive experiments, to be undistillable by all tested KD methods existing in the literature. We conduct extensive experiments on three image classification dataset, namely CIFAR-100 (Krizhevsky et al., 2012) Tiny Image Net (Le & Yang, 2015) and Image Net (Deng et al., 2009).
Researcher Affiliation	Academia	Linfeng Ye EMAIL Department of Electrical and Computer Engineering University of Waterloo Shayan Mohajer Hamidi EMAIL Department of Electrical and Computer Engineering University of Waterloo En-Hui Yang EMAIL Department of Electrical and Computer Engineering University of Waterloo
Pseudocode	Yes	The proposed alternating algorithm for optimization problem equation 27 is summarized in Algorithm 1 3.
Open Source Code	Yes	The code for the paper is publicly available at https://anonymous.4open.science/r/CMIM-605C/README.md.
Open Datasets	Yes	We conduct extensive experiments on three image classification dataset, namely CIFAR-100 (Krizhevsky et al., 2012) Tiny Image Net (Le & Yang, 2015) and Image Net (Deng et al., 2009).
Dataset Splits	Yes	CIFAR-100 (Krizhevsky et al., 2012) dataset contains 50K training and 10K test color images, each with size 32 32, categorized into 100 classes. Tiny Image Net (Le & Yang, 2015) contains 120K color images across 200 classes, each with a resolution of 64 64 pixels. For each class, there are 500 training images, 50 validation images and 50 test images. Image Net (Deng et al., 2009) is a large-scale dataset used in visual recognition tasks, containing around 1.2 million training and 50K validation images.
Hardware Specification	Yes	For each experiment, we utilized 16 CPU cores, 64 GB of memory, and one NVIDIA V100 GPU.
Software Dependencies	Yes	The software environment comprised Python 3.10, Py Torch 1.13, and CUDA 11.
Experiment Setup	Yes	For all experiments, including defenses and attacks, the SGD optimizer (Robbins & Monro, 1951; Le Cun et al., 2002) with a learning rate of 0.1 is used unless otherwise specified. For the CIFAR-100 and Tiny Image Net datasets, we train the model for 200 epochs, decaying the learning rate by 0.1 at epochs 60, 120, 160. For Image Net, we follow the standard Py Torch practice. The batch size is 128 for both CIFAR-100 and Tiny Image Net, and 256 for Image Net. To get the accuracy that a knockoff student can achieve using label smoothing, we have tested a wide spectrum of label smoothing factor ϵ = {0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}, and selected the value that yielded a classification accuracy exceeding that of all knockoff students. In the CMIM method, we set T = 20 and tested λ = 0.1, 0.25, 0.5, 1, selecting the value that minimized the CMI value while maintaining or improving classification accuracy.