reproducibilityindex.ai

Multiplicative Noise and Heavy Tails in Stochastic Optimization

Authors: Liam Hodgkinson, Michael Mahoney

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Theoretical results are obtained characterizing this for a large class of (non-linear and even non-convex) models and optimizers (including momentum, Adam, and stochastic Newton), demonstrating that this phenomenon holds generally. Furthermore, we empirically illustrate how multiplicative noise and heavy-tailed structure improve capacity for basin hopping and exploration of non-convex loss surfaces, over commonlyconsidered stochastic dynamics with only additive noise and light-tailed structure. Numerical experiments are conducted in 5, illustrating how multiplicative noise and heavy-tailed stationary behaviour improve the capacity for basin hopping (relative to light-tailed stationary behaviour) in the exploratory phase of learning.
Researcher Affiliation	Academia	1ICSI and Department of Statistics, University of California, Berkeley, USA.
Pseudocode	No	The paper describes algorithms and formulations using mathematical equations and textual descriptions, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any specific links to source code repositories or explicitly state that the code for the described methodology is publicly available.
Open Datasets	Yes	To see this, we consider ﬁtting a two-layer neural network with 16 hidden units for classiﬁcation of the Musk data set (Dietterich et al., 1997) (168 attributes; 6598 instances) with cross-entropy loss without regularization and step size γ 10 2. We plot histograms of four common wide Res Net architectures trained on CIFAR10 in Figure 3, and provide maximum likelihood estimates of the tail exponents.
Dataset Splits	No	The paper mentions using the Musk dataset and CIFAR10 but does not specify the train/validation/test splits (e.g., percentages or exact counts) used for the experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions 'powerlaw: a Python package' in its references, but it does not specify any software dependencies with version numbers used for running its own experiments.
Experiment Setup	Yes	For ﬁxed step size γ 10 2 and initial w0 4.75, the distribution of 106 successive iterates are presented in Figure 1 for small (σ 2), moderate (σ 12), and strong (σ 50) noise. ... with cross-entropy loss without regularization and step size γ 10 2. Two stochastic optimizers are compared: (a) SGD with a single sample per batch (without replacement), and (b) perturbed GD (Jin et al., 2017), where the state-independent covariance of iterations in (b) is chosen to approximate that of (a) on average.