All you need is a good init

Authors: Dmytro Mishkin, Jiri Matas

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Performance is evaluated on Goog Le Net, Caffe Net, Fit Nets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and Image Net datasets.
Researcher Affiliation Academia Dmytro Mishkin, Jiri Matas Center for Machine Perception Czech Technical University in Prague Czech Republic {mishkdmy,matas}@cmp.fel.cvut.cz
Pseudocode Yes Algorithm 1 Layer-sequential unit-variance orthogonal initialization. L convolution or fullconnected layer, WL its weights, BL its output blob., Tolvar variance tolerance, Ti current trial, Tmax max number of trials.
Open Source Code Yes 1The code allowing to reproduce the experiments is available at https://github.com/ducha-aiki/LSUVinit
Open Datasets Yes Performance is evaluated on Goog Le Net, Caffe Net, Fit Nets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and Image Net datasets.
Dataset Splits No The paper mentions 'validation accuracy' (Figures 4 and 5) and uses standard datasets like MNIST (60,000 images) and CIFAR-10/100 (60,000 images), which typically have predefined splits. However, it does not explicitly state the specific train/validation/test split percentages, absolute counts, or reference the use of predefined splits with citations for reproduction.
Hardware Specification No The paper discusses computational time and overhead (Table 6, Section 5.4 'TIMINGS') but does not specify any particular hardware components (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software systems like 'Caffe Net' (Jia et al. (2014)) but does not provide specific version numbers for any software dependencies required to reproduce the experiments.
Experiment Setup Yes The Fit Nets are trained with the stochastic gradient descent with momentum set to 0.9, the initial learning rate set to 0.01 and reduced by a factor of 10 after the 100th, 150th and 200th epoch, finishing at 230th epoch.