reproducibilityindex.ai

Stochastic Anderson Mixing for Nonconvex Stochastic Optimization

Authors: Fuchao Wei, Chenglong Bao, Yang Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we apply the SAM method to train various neural networks including the vanilla CNN, Res Nets, Wide Res Net, Res Ne Xt, Dense Net and LSTM. Experimental results on image classiﬁcation and language model demonstrate the advantages of our method.
Researcher Affiliation	Academia	Fuchao Wei1, Chenglong Bao3,4 , Yang Liu1,2 1Department of Computer Science and Technology, Tsinghua University 2Institute for AI Industry Research, Tsinghua University 3Yau Mathematical Sciences Center, Tsinghua University 4Yanqi Lake Beijing Institute of Mathematical Sciences and Applications
Pseudocode	Yes	Algorithm 1 Stochastic Anderson Mixing (SAM)
Open Source Code	No	The paper does not include an unambiguous statement or a direct link to a source-code repository for the methodology described in this paper.
Open Datasets	Yes	The datasets were MNIST [32], CIFAR-10/CIFAR-100 [31] for image classiﬁcation and Penn Treebank [35] for language model.
Dataset Splits	Yes	The training dataset was preprocessed by randomly selecting 12k images from the total 60k images to facilitate large mini-batch training. Neither weight-decay nor dropout was used. [...] For CIFAR-10 and CIFAR100, both datasets have 50K images for training and 10K images for test. [...] reported the perplexity on the validation set in Figure 3 and perplexity on the test set in Table 3
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	Footnote 3 mentions "Based on the ofﬁcial Py Torch implementation https://github.com/pytorch/examples/blob/master/mnist." This mentions PyTorch but does not specify a version number.
Experiment Setup	Yes	The learning rate was tuned and ﬁxed for each optimizer. The historical lengths for Sd LBFGS, RAM and Ada SAM were set as 20. δ = 10 6 for RAM and c1 = 10 4 for Ada SAM. [...] We trained 160 epochs with batch size of 128 and decayed the learning rate at the 80th and 120th epoch. For Ada SAM/RAM, αk and βk were decayed at the 80th and 120th epoch.