reproducibilityindex.ai

FitNets: Hints for Thin Deep Nets

Authors: Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the proposed method on MNIST, CIFAR-10, CIFAR-100, SVHN and AFLW benchmark datasets and provide evidence that our method matches or outperforms the teacher s performance, while requiring notably fewer parameters and multiplications.
Researcher Affiliation	Academia	Adriana Romero1, Nicolas Ballas2, Samira Ebrahimi Kahou3, Antoine Chassang2, Carlo Gatta4 & Yoshua Bengio2 1Universitat de Barcelona, Barcelona, Spain. 2Universit e de Montr eal, Montr eal, Qu ebec, Canada. CIFAR Senior Fellow. 3 Ecole Polytechnique de Montr eal, Montr eal, Qu ebec, Canada. 4Centre de Visi o per Computador, Bellaterra, Spain.
Pseudocode	Yes	Algorithm 1 Fit Net Stage-Wise Training.
Open Source Code	Yes	Code to reproduce the experiments publicly available: https://github.com/adri-romsor/FitNets
Open Datasets	Yes	The CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton, 2009)... The SVHN dataset (Netzer et al., 2011)... MNIST dataset (Le Cun et al., 1998)... AFLW (Koestinger et al., 2011)
Dataset Splits	Yes	On CIFAR-10, we divided the training set into 40K training examples and 10K validation examples.
Hardware Specification	No	The paper states 'on a GPU' but does not provide specific hardware details like GPU model numbers, CPU types, or memory amounts used for experiments.
Software Dependencies	No	The paper mentions software like Theano and Pylearn2, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	All Fit Net parameters were initialized randomly in U(-0.005,0.005). We used stochastic gradient descent with RMSProp (Tieleman & Hinton, 2012) to train the Fit Nets, with an initial learning rate 0.005 and a mini-batch size of 128. Parameter λ in Eq. (2) was initialized to 4 and decayed linearly during 500 epochs reaching λ = 1. The relaxation term τ was set to 3.