Zero-Shot Knowledge Distillation in Deep Networks
Authors: Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, Venkatesh Babu Radhakrishnan, Anirban Chakraborty
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our ZSKD approach via an empirical evaluation over multiple benchmark datasets and model architectures (sec. 4). |
| Researcher Affiliation | Academia | 1Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India 2School of Informatics, University of Edinburgh, United Kingdom 3University of Lincoln, United Kingdom. Correspondence to: Gaurav Kumar Nayak <gauravnayak@iisc.ac.in> |
| Pseudocode | Yes | Algorithm 1 Zero-Shot Knowledge Distillation |
| Open Source Code | No | The paper does not provide a direct link to open-source code or explicitly state that the code will be released. |
| Open Datasets | Yes | MNIST (Le Cun et al., 1998), Fashion MNIST (FMNIST) (Xiao et al., 2017), and CIFAR-10 (Krizhevsky & Hinton, 2009). |
| Dataset Splits | No | The paper specifies training and test set sizes for MNIST (60000 training, 10000 test), Fashion MNIST (60000 training, 10000 test), and CIFAR-10 (50000 training, 10000 test), but does not explicitly mention a separate validation split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Input images are resized from 28x28 to 32x32 and the pixel values are normalized to be in [0, 1] before feeding into the models. We consider two (B = 2) scaling factors, β1 = 1.0 and β2 = 0.1 across all the datasets, i.e., for each dataset, half the Data Impressions are generated with β1 and the other with β2. A temperature value (τ) of 20 is used across all the datasets. We augment the samples using regular operations such as scaling, translation, rotation, flipping etc. |