Adam with Bandit Sampling for Deep Learning

Authors: Rui Liu, Tianyi Wu, Barzan Mozafari

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically show that ADAMBS improves the convergence rate of Adam O( q T ) instead of T ) in some cases. Experiments on various models and datasets demonstrate ADAMBS s fast convergence in practice. Experiments on various models and datasets demonstrate ADAMBS s fast convergence in practice. Through an extensive empirical study across various optimization tasks and datasets, we also show that this new method yields significant speedups in practice as well.
Researcher Affiliation Academia Rui Liu, Tianyi Wu, Barzan Mozafari Computer Science and Engineering University of Michigan, Ann Arbor {ruixliu, tianyiwu, mozafari}@umich.edu
Pseudocode Yes Algorithm 1 ADAMBS, our proposed Adam with bandit sampling. Algorithm 2 The distribution update rule for pt.
Open Source Code No No explicit statement or link is provided for the release of the authors' own source code for the methodology described.
Open Datasets Yes In total, 5 datasets are used: MNIST, Fashion MNIST, CIFAR10, CIFAR100 and IMDB. CIFAR10 and CIFAR100 are labeled subsets of the 80 million tiny images dataset [20]. Fashion MNIST dataset [35] is similar to MNIST dataset except that images are in 10 fashion categories.
Dataset Splits No The paper mentions using common datasets like CIFAR10, CIFAR100, and Fashion MNIST, but does not explicitly provide the specific training, validation, and test dataset splits used for their experiments (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory, or cloud instances).
Software Dependencies No The paper states that 'Experiments are conducted using Keras [10] with Tensor Flow [1]', but it does not specify version numbers for these software components.
Experiment Setup Yes Specifically, β1 and β2 are common hyperparameters to Adam, ADAM-IMPT and ADAMBS, which are chosen to be 0.9 and 0.999, respectively. The mini-batch size is set to 128, and learning rate is set to 0.001 for all methods on all three datasets. All three methods are used to train CNN models for 10 epochs. The batch size is set as 32, the learning rate is set as 10^-6, and the maximum number of epochs is set as 200 for all methods on both datasets. We set batch size to 30 and learning rate to 0.001, and run all methods for 10 epochs.