reproducibilityindex.ai

Using Statistics to Automate Stochastic Optimization

Authors: Hunter Lang, Lin Xiao, Pengchuan Zhang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on several deep learning tasks demonstrate that this statistical adaptive stochastic approximation (SASA) method can automatically ﬁnd good learning rate schedules and match the performance of hand-tuned methods using default settings of its parameters.
Researcher Affiliation	Industry	Hunter Lang Pengchuan Zhang Lin Xiao Microsoft Research AI Redmond, WA 98052, USA {hunter.lang, penzhan, lin.xiao}@microsoft.com
Pseudocode	Yes	Algorithm 1: General SASA method; Algorithm 2: SASA; Algorithm 3: Test
Open Source Code	No	The paper cites 'Pytorch word language model. https://github.com/pytorch/examples/tree/master/word_language_model, 2019.' which is a third-party example, but it does not provide access to the source code for the SASA methodology described in the paper.
Open Datasets	Yes	We trained an 18-layer Res Net model4 He et al. (He et al., 2016) on CIFAR-10 (Krizhevsky and Hinton, 2009); Image Net (Deng et al., 2009); We train the Py Torch word-level language model example (2019) on the Wikitext-2 dataset (Merity et al., 2016).
Dataset Splits	Yes	We compare against SGM and Adam with (global) learning rate tuned using a validation set. These baselines drop the learning rate by a factor of 4 when the validation loss stops improving.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'Py Torch word-level language model example (2019)' which implies PyTorch, but no specific version number for PyTorch or other libraries is provided.
Experiment Setup	Yes	For all experiments, we use default values δ = 0.02 and γ = 0.2. In each experiment, we use the same 0 and as for the best SGM baseline. We use weight decay in every experiment... SGM-hand uses 0 = 1.0 and β = 0.9 and drops by a factor of 10 ( = 0.1) every 50 epochs. SASA uses γ = 0.2 and δ = 0.02, as always. Adam has a tuned global learning rate 0 = 0.0001 and a tuned warmup phase of 50 epochs... We trained an 18-layer Res Net model... with random cropping and random horizontal ﬂipping for data augmentation and weight decay 0.0005. ... We used 600-dimensional embeddings, 600 hidden units, tied weights, and dropout 0.65, and gradient clipping with threshold 2.0.