Learning by Turning: Neural Architecture Aware Optimisation
Authors: Yang Liu, Jeremy Bernstein, Markus Meister, Yisong Yue
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section presents experiments intended to demonstrate Nero s key properties. In all figures, the mean and range are plotted over three repeats. |
| Researcher Affiliation | Collaboration | 1Abacus.AI 2Caltech. |
| Pseudocode | Yes | Algorithm 1 Nero optimiser. Out-of-the-box hyperparameter defaults are 0.01 and β 0.999. The constant σb P R refers to the initialisation scale of the biases. |
| Open Source Code | Yes | Code available at github.com/jxbz/nero. |
| Open Datasets | Yes | A VGG-11 image classifier on the CIFAR-10 dataset, [...] classify the MNIST dataset. [...] train a language model on the Wikitext-2 dataset, and a larger transformer (121 tensors) trained on WMT2016 English German translation. [...] PPO on the Atari Pong video game. [...] Res Net-50 classifier on the Image Net dataset. |
| Dataset Splits | No | The paper mentions using well-known datasets like CIFAR-10, MNIST, Wikitext-2, and ImageNet, which typically have predefined splits. It also refers to 'validation error' and 'validation results'. However, it does not explicitly state the specific percentages or sample counts for train/validation/test splits used in its experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a 'Pytorch implementation' but does not specify version numbers for PyTorch or any other software dependencies, making it difficult to precisely reproduce the environment. |
| Experiment Setup | Yes | For Nero, out-of-the-box refers to setting 0.01 and β 0.999. [...] Learning rates were tuned over t10 4, 10 3, ..., 100u. [...] β in Nero and β2 in Adam and LAMB were fixed to 0.999 across all experiments. [...] Typical initialisation scales are σb 1 for gains and σb 0.01 for biases. |