FitNets: Hints for Thin Deep Nets

Authors: Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the proposed method on MNIST, CIFAR-10, CIFAR-100, SVHN and AFLW benchmark datasets and provide evidence that our method matches or outperforms the teacher s performance, while requiring notably fewer parameters and multiplications.
Researcher Affiliation Academia Adriana Romero1, Nicolas Ballas2, Samira Ebrahimi Kahou3, Antoine Chassang2, Carlo Gatta4 & Yoshua Bengio2 1Universitat de Barcelona, Barcelona, Spain. 2Universit e de Montr eal, Montr eal, Qu ebec, Canada. CIFAR Senior Fellow. 3 Ecole Polytechnique de Montr eal, Montr eal, Qu ebec, Canada. 4Centre de Visi o per Computador, Bellaterra, Spain.
Pseudocode Yes Algorithm 1 Fit Net Stage-Wise Training.
Open Source Code Yes Code to reproduce the experiments publicly available: https://github.com/adri-romsor/FitNets
Open Datasets Yes The CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton, 2009)... The SVHN dataset (Netzer et al., 2011)... MNIST dataset (Le Cun et al., 1998)... AFLW (Koestinger et al., 2011)
Dataset Splits Yes On CIFAR-10, we divided the training set into 40K training examples and 10K validation examples.
Hardware Specification No The paper states 'on a GPU' but does not provide specific hardware details like GPU model numbers, CPU types, or memory amounts used for experiments.
Software Dependencies No The paper mentions software like Theano and Pylearn2, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes All Fit Net parameters were initialized randomly in U(-0.005,0.005). We used stochastic gradient descent with RMSProp (Tieleman & Hinton, 2012) to train the Fit Nets, with an initial learning rate 0.005 and a mini-batch size of 128. Parameter λ in Eq. (2) was initialized to 4 and decayed linearly during 500 epochs reaching λ = 1. The relaxation term τ was set to 3.