Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Authors: Pranshu Malviya, Goncalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive analysis in simple settings, we illustrate the efficacy of our approach in increasing exploration and bias towards flatter minima. We empirically demonstrate that it can improve model performance for image classification on Image Net and CIFAR10/100, language modelling on Penn Treebank, and online learning tasks on Tiny Image Net and 5-dataset. Our code is available at https://github.com/chandar-lab/CMOptimizer.
Researcher Affiliation Collaboration Pranshu Malviya EMAIL Mila Quebec AI Institute, Polytechnique Montreal Gonçalo Mordido EMAIL Mila Quebec AI Institute, Polytechnique Montreal Aristide Baratin EMAIL Samsung SAIT AI Lab Montreal Reza Babanezhad Harikandeh EMAIL Samsung SAIT AI Lab Montreal Jerry Huang EMAIL Mila Quebec AI Institute, Université de Montréal Simon Lacoste-Julien EMAIL Mila Quebec AI Institute, Université de Montréal, Samsung SAIT AI Lab Montreal, Canada CIFAR AI Chair Razvan Pascanu EMAIL Google Deep Mind Sarath Chandar EMAIL Mila Quebec AI Institute, Polytechnique Montreal, Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1: Adam with Critical Momenta Require: Initial parameters θ0 and moments m0, v M 0 , loss L, step size α, buffer mc, capacity C, decay λ for t = 1, 2, do Sample mini-batch & compute loss gradient Update 1st moments mt with equation 4 Aggregate buffer moments m M t mt with equation 4 Update 2nd moments v M t with equation 5 if buffer is not full then Add mt to mc else if Priority(mt) > min(Priority(mc)) then Replace smallest priority element with mt end if Decay Priority(mc) using λ Update parameter θt with equation 7 end for
Open Source Code Yes Our code is available at https://github.com/chandar-lab/CMOptimizer.
Open Datasets Yes We empirically demonstrate that it can improve model performance for image classification on Image Net and CIFAR10/100, language modelling on Penn Treebank, and online learning tasks on Tiny Image Net and 5-dataset. The Penn Treebank (PTB) (Marcus et al., 1993) CIFAR10 (Krizhevsky et al., 2009) Image Net (Deng et al., 2009) Tiny Imagenet (Zhang et al., 2019) 5-dataset (Mehta et al., 2023)
Dataset Splits Yes Dataset Train set Validation set PTB 890K 70K CIFAR10 40K 10K CIFAR100 40K 10K Image Net 1281K 50K
Hardware Specification Yes All experiments were executed on an NVIDIA A100 Tensor Core GPUs machine with 40 GB memory.
Software Dependencies No We used a publicly available Efficient Net implementation2 in Py Torch (Paszke et al., 2019), with a weight decay (Loshchilov & Hutter, 2019) of 10 4 and a learning rate scheduler where the initial learning rate is reduced by a factor of 10 every 30 epochs. We provide additional details about the grid search, datasets, and models in Appendix A.2.2. Explanation: The paper mentions "Py Torch" but does not specify a version number for it or any other key software components.
Experiment Setup Yes Unless specified in the experiment description, the default set of other hyperparameters in all our experiments is {β1, β2, C, λ, ρ} = {0.9, 0.99, 5, 0.7, 0.05} except in CIFAR10/100 experiments where β2 is set to 0.999. The default values of C and λ are decided based on the suggested values from Mc Rae et al. (2022) and ρ based on Foret et al. (2021). Hyper-parameter Set lr {0.1, 0.01, 0.001, 0.0001} β1 {0.9, 0.99, 0.999} β2 {0.99, 0.999, 0.9999} C {5, 20} λ {0.7, 0.99} ρ {0.01, 0.05, 0.1}