reproducibilityindex.ai

On Monotonic Linear Interpolation of Neural Network Parameters

Authors: James R Lucas, Juhan Bae, Michael R Zhang, Stanislav Fort, Richard Zemel, Roger B Grosse

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extending this work, we evaluate several hypotheses for this property that, to our knowledge, have not yet been explored. Using tools from differential geometry, we draw connections between the interpolated paths in function space and the monotonicity of the network providing sufﬁcient conditions for the MLI property under mean squared error. While the MLI property holds under various settings (e.g. network architectures and learning problems), we show in practice that networks violating the MLI property can be produced systematically, by encouraging the weights to move far from initialization. ... To address these questions, we provide an expanded empirical and theoretical study of this phenomenon.
Researcher Affiliation	Academia	James Lucas 1 2 Juhan Bae 1 2 Michael R. Zhang 1 2 Stanislav Fort 3 Richard Zemel 1 2 Roger Grosse 1 2 1University of Toronto 2Vector Institute 3Stanford University. Correspondence to: James Lucas <jlucas@cs.toronto.edu>.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a direct link to open-source code for the described methodology or an explicit statement of its release.
Open Datasets	Yes	For the reconstruction tasks, we trained fully-connected deep autoencoders on MNIST (Le Cun et al., 2010). For the classiﬁcation tasks, we trained networks on MNIST, Fashion-MNIST (Xiao et al., 2017), CIFAR-10, and CIFAR-100 (Krizhevsky et al., 2009). ... We provide a short study on the language modeling setting as well by training LSTM (Hochreiter & Schmidhuber, 1997b) and Transformer (Vaswani et al., 2017) architectures on Wiki Text-2 (Stephen et al., 2016) dataset. We also experimented with Ro BERTa (Liu et al., 2019) on the Esperanto (Conneau et al., 2019) dataset.
Dataset Splits	No	The paper mentions 'training set' and 'held-out datasets' but does not specify exact training/validation/test dataset splits, percentages, or explicit sample counts for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions the 'Hugging Face library' but does not provide specific version numbers for key software components or libraries required for replication.
Experiment Setup	Yes	For all experiments, unless speciﬁed otherwise, we discretize in the interval [0, 1] using 50 uniform steps. ... We trained autoencoders with SGD and Adam, with varying learning rates and with a varying number of hidden layer size. ... LR: 0.001 0.003 0.01 0.03 0.1 0.3 1.0 3.0 ... We also varied the distribution over initial parameters and whether or not batch normalization was applied. The results for CIFAR-10 are displayed in Table 2 (CIFAR-100 results are similar, and are presented in Appendix C). The column headers, BN and NBN indicate batch normalization and no batch normalization respectively. The sufﬁces I and F indicate two alternative initialization schemes, block-identity initialization (Goyal et al., 2017) and Fixup initialization (Zhang et al., 2019b).