Nesterov Accelerated Shuffling Gradient Method for Convex Optimization
Authors: Trang H Tran, Katya Scheinberg, Lam M Nguyen
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical simulations demonstrate the efficiency of our algorithm. |
| Researcher Affiliation | Collaboration | 1School of Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA. 2IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA. |
| Pseudocode | Yes | Algorithm 2 Nesterov Accelerated Shuffling Gradient (NASG) Method |
| Open Source Code | Yes | Our code can be found at the repository https://github. com/htt-trangtran/nasg. |
| Open Datasets | Yes | We have conducted the experiments on three classification datasets w8a (49, 749 samples), ijcnn1 (91, 701 samples) and covtype (406709 samples) from LIBSVM (Chang & Lin, 2011). ... We test our algorithm using linear neural networks on three well-known image classification datasets: MNIST dataset (Le Cun et al., 1998) and Fashion-MNIST dataset (Xiao et al., 2017) both with 60, 000 samples, and finally CIFAR-10 dataset (Krizhevsky & Hinton, 2009) with 50, 000 images. |
| Dataset Splits | Yes | At the tuning stage, we test each method for 20 epochs. We run every algorithm with a constant learning rate where the learning rates follows a grid search and select the ones that perform best according to their results. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or memory) used to run its experiments. |
| Software Dependencies | Yes | All the algorithms are implemented in Python using Py Torch package (Paszke et al., 2019). |
| Experiment Setup | Yes | The minibatch size is 256. ... We tune each algorithm using constant learning rate and report the best final results. ... For SGD and NASG the searching grid is {1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001}. ... For SGD-M, ... Note that this momentum update is implemented in Py Torch with the default value β = 0.9. ... For Adam, we fixed two hyper-parameters β1 := 0.9, β2 := 0.999 as in the original paper. Since the default learning rate for Adam is 0.001, we let our searching grid be {0.005, 0.001, 0.0005}. |