reproducibilityindex.ai

Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

Authors: Trang H Tran, Katya Scheinberg, Lam M Nguyen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical simulations demonstrate the efficiency of our algorithm.
Researcher Affiliation	Collaboration	1School of Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA. 2IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA.
Pseudocode	Yes	Algorithm 2 Nesterov Accelerated Shuffling Gradient (NASG) Method
Open Source Code	Yes	Our code can be found at the repository https://github. com/htt-trangtran/nasg.
Open Datasets	Yes	We have conducted the experiments on three classification datasets w8a (49, 749 samples), ijcnn1 (91, 701 samples) and covtype (406709 samples) from LIBSVM (Chang & Lin, 2011). ... We test our algorithm using linear neural networks on three well-known image classification datasets: MNIST dataset (Le Cun et al., 1998) and Fashion-MNIST dataset (Xiao et al., 2017) both with 60, 000 samples, and finally CIFAR-10 dataset (Krizhevsky & Hinton, 2009) with 50, 000 images.
Dataset Splits	Yes	At the tuning stage, we test each method for 20 epochs. We run every algorithm with a constant learning rate where the learning rates follows a grid search and select the ones that perform best according to their results.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or memory) used to run its experiments.
Software Dependencies	Yes	All the algorithms are implemented in Python using Py Torch package (Paszke et al., 2019).
Experiment Setup	Yes	The minibatch size is 256. ... We tune each algorithm using constant learning rate and report the best final results. ... For SGD and NASG the searching grid is {1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001}. ... For SGD-M, ... Note that this momentum update is implemented in Py Torch with the default value β = 0.9. ... For Adam, we fixed two hyper-parameters β1 := 0.9, β2 := 0.999 as in the original paper. Since the default learning rate for Adam is 0.001, we let our searching grid be {0.005, 0.001, 0.0005}.