All you need is a good init
Authors: Dmytro Mishkin, Jiri Matas
ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Performance is evaluated on Goog Le Net, Caffe Net, Fit Nets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and Image Net datasets. |
| Researcher Affiliation | Academia | Dmytro Mishkin, Jiri Matas Center for Machine Perception Czech Technical University in Prague Czech Republic {mishkdmy,matas}@cmp.fel.cvut.cz |
| Pseudocode | Yes | Algorithm 1 Layer-sequential unit-variance orthogonal initialization. L convolution or fullconnected layer, WL its weights, BL its output blob., Tolvar variance tolerance, Ti current trial, Tmax max number of trials. |
| Open Source Code | Yes | 1The code allowing to reproduce the experiments is available at https://github.com/ducha-aiki/LSUVinit |
| Open Datasets | Yes | Performance is evaluated on Goog Le Net, Caffe Net, Fit Nets and Residual nets and the state-of-the-art, or very close to it, is achieved on the MNIST, CIFAR-10/100 and Image Net datasets. |
| Dataset Splits | No | The paper mentions 'validation accuracy' (Figures 4 and 5) and uses standard datasets like MNIST (60,000 images) and CIFAR-10/100 (60,000 images), which typically have predefined splits. However, it does not explicitly state the specific train/validation/test split percentages, absolute counts, or reference the use of predefined splits with citations for reproduction. |
| Hardware Specification | No | The paper discusses computational time and overhead (Table 6, Section 5.4 'TIMINGS') but does not specify any particular hardware components (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software systems like 'Caffe Net' (Jia et al. (2014)) but does not provide specific version numbers for any software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | The Fit Nets are trained with the stochastic gradient descent with momentum set to 0.9, the initial learning rate set to 0.01 and reduced by a factor of 10 after the 100th, 150th and 200th epoch, finishing at 230th epoch. |