NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations

Authors: Marco Ciccone, Marco Gallieri, Jonathan Masci, Christian Osendorfer, Faustino Gomez

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show how NAIS-Net exhibits stability in practice, yielding a significant reduction in generalization gap compared to Res Nets. [...] These implementations are compared experimentally with Res Nets on both CIFAR-10 and CIFAR-100 datasets, in section 5, showing that NAIS-Nets achieve comparable classification accuracy with a much better generalization gap.
Researcher Affiliation Collaboration Marco Ciccone Politecnico di Milano NNAISENSE SA marco.ciccone@polimi.it Marco Gallieri NNAISENSE SA marco@nnaisense.com Jonathan Masci NNAISENSE SA jonathan@nnaisense.com Christian Osendorfer NNAISENSE SA christian@nnaisense.com Faustino Gomez NNAISENSE SA tino@nnaisense.com
Pseudocode Yes Algorithm 1 Fully Connected Reprojection and Algorithm 2 CNN Reprojection
Open Source Code No The paper does not provide an explicit statement or a link to open-source code for the described methodology.
Open Datasets Yes Experiments were conducted comparing NAIS-Net with Res Net, and variants thereof, using both fully-connected (MNIST, section 5.1) and convolutional (CIFAR-10/100, section 5.2) architectures... [27] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. 2009. [32] Yann Le Cun. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
Dataset Splits Yes Experiments were conducted comparing NAIS-Net with Res Net, and variants thereof, using both fully-connected (MNIST, section 5.1) and convolutional (CIFAR-10/100, section 5.2) architectures... For the MNIST dataset [32] a single-block NAIS-Net was compared... These benchmarks are simple enough to allow for multiple runs to test for statistical significance, yet sufficiently complex to require convolutional layers.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper mentions 'tensorflow/models' in a footnote, but it does not specify version numbers for TensorFlow or any other software libraries or dependencies used in their implementation.
Experiment Setup Yes All networks were trained using stochastic gradient descent with momentum 0.9 and learning rate 0.1, for 150 epochs. [...] The initial learning rate of 0.1 was decreased by a factor of 10 at epochs 150, 250 and 350 and the experiment were run for 450 epochs.