A Class of Short-term Recurrence Anderson Mixing Methods and Their Applications
Authors: Fuchao Wei, Chenglong Bao, Yang Liu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that ST-AM is competitive with the long-memory AM and outperforms many existing optimizers. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, Tsinghua University 2Institute for AI Industry Research, Tsinghua University 3Yau Mathematical Sciences Center, Tsinghua University 4Yanqi Lake Beijing Institute of Mathematical Sciences and Applications |
| Pseudocode | Yes | Algorithm 1 RST-AM for stochastic programming |
| Open Source Code | No | The paper does not provide any links to open-source code or explicit statements about code availability. |
| Open Datasets | Yes | We applied RST-AM to train neural networks, with full-batch training on MNIST (Le Cun et al., 1998), and mini-batch training on CIFAR-10/CIFAR-100 and Penn Treebank (Marcus et al., 1993). |
| Dataset Splits | Yes | We randomly selected 5K images from the total 50K training dataset as the validation dataset (the other 45K images remained as the training dataset), and chose the best checkpoint model in the validation set to evaluate on the test dataset. |
| Hardware Specification | Yes | Our main codes were written based on the Py Torch framework 1 and one Ge Force RTX 2080 Ti GPU was used for the tests in training neural networks. |
| Software Dependencies | No | Our main codes were written based on the Py Torch framework 1. |
| Experiment Setup | Yes | The optimizer was Adam with learning rate of 0.001 and the weight-decay was 2.5 10 6. The batch size was 128 and the number of epochs was 50. |