A Class of Short-term Recurrence Anderson Mixing Methods and Their Applications

Authors: Fuchao Wei, Chenglong Bao, Yang Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that ST-AM is competitive with the long-memory AM and outperforms many existing optimizers.
Researcher Affiliation Academia 1Department of Computer Science and Technology, Tsinghua University 2Institute for AI Industry Research, Tsinghua University 3Yau Mathematical Sciences Center, Tsinghua University 4Yanqi Lake Beijing Institute of Mathematical Sciences and Applications
Pseudocode Yes Algorithm 1 RST-AM for stochastic programming
Open Source Code No The paper does not provide any links to open-source code or explicit statements about code availability.
Open Datasets Yes We applied RST-AM to train neural networks, with full-batch training on MNIST (Le Cun et al., 1998), and mini-batch training on CIFAR-10/CIFAR-100 and Penn Treebank (Marcus et al., 1993).
Dataset Splits Yes We randomly selected 5K images from the total 50K training dataset as the validation dataset (the other 45K images remained as the training dataset), and chose the best checkpoint model in the validation set to evaluate on the test dataset.
Hardware Specification Yes Our main codes were written based on the Py Torch framework 1 and one Ge Force RTX 2080 Ti GPU was used for the tests in training neural networks.
Software Dependencies No Our main codes were written based on the Py Torch framework 1.
Experiment Setup Yes The optimizer was Adam with learning rate of 0.001 and the weight-decay was 2.5 10 6. The batch size was 128 and the number of epochs was 50.