Learning Student Networks with Few Data

Authors: Shumin Kong, Tianyu Guo, Shan You, Chang Xu4469-4476

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on benchmark datasets validate the effectiveness of our proposed method. and Now we empirically evaluate the proposed algorithm on popular benchmark datasets, including CIFAR-10 dataset, CIFAR-100 dataset and Fashion-MNIST dataset.
Researcher Affiliation Collaboration Shumin Kong,1 Tianyu Guo,1,2 Shan You,3 Chang Xu1 1School of Computer Science, Faculty of Engineering, The University of Sydney, Australia 2Key Laboratory of Machine Percepton (MOE), CMIC, School of EECS, Peking University, China 3Sense Time Research, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes Now we empirically evaluate the proposed algorithm on popular benchmark datasets, including CIFAR-10 dataset, CIFAR-100 dataset and Fashion-MNIST dataset. followed by citations like (Krizhevsky 2009) for CIFAR and (Xiao, Rasul, and Vollgraf 2017) for Fashion-MNIST.
Dataset Splits No The paper specifies training and testing set sizes (e.g., '50,000 of the images are training set and the remaining 10,000 images are intended for testing' for CIFAR-10/100, and '60,000 and 10,000, respectively' for Fashion-MNIST) but does not explicitly mention or detail a specific validation dataset split.
Hardware Specification Yes The experiments are run on a single NVIDIA Ge Force 1080 Ti GPU.
Software Dependencies No The paper mentions training methods like 'back propagation and Stochastic Gradient Descent (SGD)' but does not specify any software libraries, frameworks, or their version numbers that were used to implement the experiments.
Experiment Setup Yes In our experiments, ϵ is set to 1 and α is set to 0.001. Temperature T for KD loss is set to 3. On both datasets, the student networks are trained using back propagation and Stochastic Gradient Descent (SGD) with momentum for 500 epochs. During training, the learning rate and the momentum decay linearly.