Zero-Shot Knowledge Distillation in Deep Networks

Authors: Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, Venkatesh Babu Radhakrishnan, Anirban Chakraborty

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our ZSKD approach via an empirical evaluation over multiple benchmark datasets and model architectures (sec. 4).
Researcher Affiliation Academia 1Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India 2School of Informatics, University of Edinburgh, United Kingdom 3University of Lincoln, United Kingdom. Correspondence to: Gaurav Kumar Nayak <gauravnayak@iisc.ac.in>
Pseudocode Yes Algorithm 1 Zero-Shot Knowledge Distillation
Open Source Code No The paper does not provide a direct link to open-source code or explicitly state that the code will be released.
Open Datasets Yes MNIST (Le Cun et al., 1998), Fashion MNIST (FMNIST) (Xiao et al., 2017), and CIFAR-10 (Krizhevsky & Hinton, 2009).
Dataset Splits No The paper specifies training and test set sizes for MNIST (60000 training, 10000 test), Fashion MNIST (60000 training, 10000 test), and CIFAR-10 (50000 training, 10000 test), but does not explicitly mention a separate validation split.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Input images are resized from 28x28 to 32x32 and the pixel values are normalized to be in [0, 1] before feeding into the models. We consider two (B = 2) scaling factors, β1 = 1.0 and β2 = 0.1 across all the datasets, i.e., for each dataset, half the Data Impressions are generated with β1 and the other with β2. A temperature value (τ) of 20 is used across all the datasets. We augment the samples using regular operations such as scaling, translation, rotation, flipping etc.