MLI Formula: A Nearly Scale-Invariant Solution with Noise Perturbation

Authors: Bowen Tao, Xin-Chun Li, De-Chuan Zhan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we extend the experiments in Goodfellow & Vinyals (2015) to modern settings, including VGG-style networks (Simonyan & Zisserman, 2015), Res Nets (He et al., 2016) and Batch Normalization (BN) (Ioffe & Szegedy, 2015). We discover that substituting the original initialization θ0 with an unrelated random initialization θ 0 still leads to monotonic decreasing loss curves. To validate our interpretation, we provide empirical analyses on multiple datasets and perform experiments to explain the phenomenon of violating the MLI property under specific mechanisms. In this section, we perform experiments aimed at comprehensively understanding the MLI formula and validating our hypotheses.
Researcher Affiliation Academia 1School of Artificial Intelligence, Nanjing University, China 2National Key Laboratory for Novel Software Technology, Nanjing University, China. Correspondence to: De Chuan Zhan <zhandc@nju.edu.cn>.
Pseudocode Yes Listing 1. Monotonic Loss Decreasing Presented by MLI Formula
Open Source Code No The paper states: 'We provide a demonstration using scikit-learn 2 to showcase the monotonic decreasing property. The code is listed in Code 1.' While code is presented in Listing 1 within the paper, it is a small demonstration snippet and not a link to a code repository for the full methodology described in the paper, nor is it provided as supplementary material in a distinct external file.
Open Datasets Yes We study four image classification settings including a 4-layer fully-connected network (FCN4) for MNIST (Le Cun & Cortes, 2010), VGG8 without Batch Normalization (Ioffe & Szegedy, 2015) for SVHN (Netzer et al., 2011), VGG16 with Batch Normalization (Ioffe & Szegedy, 2015) for CIFAR10 (Krizhevsky, 2009), and Res Net20 for CIFAR100 (Krizhevsky, 2009).
Dataset Splits No The paper mentions using well-known datasets like MNIST, SVHN, CIFAR10, and CIFAR100. While these datasets have standard splits, the paper does not explicitly state the train/validation/test split percentages, sample counts, or provide citations for predefined splits. It only details training configurations like epochs and learning rates.
Hardware Specification No The paper does not explicitly describe the hardware used for its experiments. It mentions 'modern settings' but provides no specific details such as GPU models, CPU types, or cloud computing instances.
Software Dependencies No The paper mentions the use of 'scikit-learn' for a demonstration and imports 'numpy', 'matplotlib', and 'scipy' in the provided code snippet. However, it does not specify version numbers for any of these software components, which is required for reproducible description.
Experiment Setup Yes B.1.1. FCN4 MNIST: ...Each network is trained for 100 epochs using a fixed learning rate 1e-2. B.1.2. VGG8 SVHN: ...We train the network using SGD with momentum 0.9 and weight decay 1e-4 for 100 epochs. For the learning rate, we start from 0.01 and reduce it by a factor of 0.1 at the 60-th and 90-th epoch. B.1.3. VGG16 CIFAR10: ...We train VGG16 with batch normalization on CIFAR10 dataset using SGD optimizer with momentum 0.9 and weight decay 1e 4 for 160 epochs. We initialize the learning rate as 0.1 and decay it by a factor of 0.1 at the 80-th and 120-th epochs. B.1.4. RESNET20 CIFAR100: ...We use Res Net20 for CIFAR100. We train the network using SGD with momentum 0.9 and weight decay 1e 4 for 160 epochs. For the learning rate, we start from 0.1 and reduce it by 0.1 at the milestone of the 60-th, 90-th, and 120-th epoch.