FitNets: Hints for Thin Deep Nets
Authors: Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the proposed method on MNIST, CIFAR-10, CIFAR-100, SVHN and AFLW benchmark datasets and provide evidence that our method matches or outperforms the teacher s performance, while requiring notably fewer parameters and multiplications. |
| Researcher Affiliation | Academia | Adriana Romero1, Nicolas Ballas2, Samira Ebrahimi Kahou3, Antoine Chassang2, Carlo Gatta4 & Yoshua Bengio2 1Universitat de Barcelona, Barcelona, Spain. 2Universit e de Montr eal, Montr eal, Qu ebec, Canada. CIFAR Senior Fellow. 3 Ecole Polytechnique de Montr eal, Montr eal, Qu ebec, Canada. 4Centre de Visi o per Computador, Bellaterra, Spain. |
| Pseudocode | Yes | Algorithm 1 Fit Net Stage-Wise Training. |
| Open Source Code | Yes | Code to reproduce the experiments publicly available: https://github.com/adri-romsor/FitNets |
| Open Datasets | Yes | The CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton, 2009)... The SVHN dataset (Netzer et al., 2011)... MNIST dataset (Le Cun et al., 1998)... AFLW (Koestinger et al., 2011) |
| Dataset Splits | Yes | On CIFAR-10, we divided the training set into 40K training examples and 10K validation examples. |
| Hardware Specification | No | The paper states 'on a GPU' but does not provide specific hardware details like GPU model numbers, CPU types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions software like Theano and Pylearn2, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | All Fit Net parameters were initialized randomly in U(-0.005,0.005). We used stochastic gradient descent with RMSProp (Tieleman & Hinton, 2012) to train the Fit Nets, with an initial learning rate 0.005 and a mini-batch size of 128. Parameter λ in Eq. (2) was initialized to 4 and decayed linearly during 500 epochs reaching λ = 1. The relaxation term τ was set to 3. |