Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FitNets: Hints for Thin Deep Nets
Authors: Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio
ICLR 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the proposed method on MNIST, CIFAR-10, CIFAR-100, SVHN and AFLW benchmark datasets and provide evidence that our method matches or outperforms the teacher s performance, while requiring notably fewer parameters and multiplications. |
| Researcher Affiliation | Academia | Adriana Romero1, Nicolas Ballas2, Samira Ebrahimi Kahou3, Antoine Chassang2, Carlo Gatta4 & Yoshua Bengio2 1Universitat de Barcelona, Barcelona, Spain. 2Universit e de Montr eal, Montr eal, Qu ebec, Canada. CIFAR Senior Fellow. 3 Ecole Polytechnique de Montr eal, Montr eal, Qu ebec, Canada. 4Centre de Visi o per Computador, Bellaterra, Spain. |
| Pseudocode | Yes | Algorithm 1 Fit Net Stage-Wise Training. |
| Open Source Code | Yes | Code to reproduce the experiments publicly available: https://github.com/adri-romsor/FitNets |
| Open Datasets | Yes | The CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton, 2009)... The SVHN dataset (Netzer et al., 2011)... MNIST dataset (Le Cun et al., 1998)... AFLW (Koestinger et al., 2011) |
| Dataset Splits | Yes | On CIFAR-10, we divided the training set into 40K training examples and 10K validation examples. |
| Hardware Specification | No | The paper states 'on a GPU' but does not provide specific hardware details like GPU model numbers, CPU types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions software like Theano and Pylearn2, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | All Fit Net parameters were initialized randomly in U(-0.005,0.005). We used stochastic gradient descent with RMSProp (Tieleman & Hinton, 2012) to train the Fit Nets, with an initial learning rate 0.005 and a mini-batch size of 128. Parameter λ in Eq. (2) was initialized to 4 and decayed linearly during 500 epochs reaching λ = 1. The relaxation term τ was set to 3. |