SMG: A Shuffling Gradient-Based Method with Momentum

Authors: Trang H Tran, Lam M Nguyen, Quoc Tran-Dinh

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our algorithms via numerical simulations on standard datasets and compare them with existing shuffling methods. Our tests have shown encouraging performance of the new algorithms.
Researcher Affiliation Collaboration 1School of Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA. 2IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA. 3Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill, NC, USA.
Pseudocode Yes Algorithm 1 Shuffling Momentum Gradient (SMG)
Open Source Code No The paper mentions using third-party packages like PyTorch and TensorFlow for implementation, but does not provide an explicit statement or link for the open-source code of their own methodology.
Open Datasets Yes For the fully connected setting, we train the classic Le Net-300-100 model (Le Cun et al., 1998) on the Fashion-MNIST dataset (Xiao et al., 2017) with 60, 000 images. We also use the convolutional Le Net-5 (Le Cun et al., 1998) architecture to train the well-known CIFAR-10 dataset (Krizhevsky & Hinton, 2009) with 50, 000 samples. We have conducted the experiments on two classification datasets w8a (49, 749 samples) and ijcnn1 (91, 701 samples) from LIBSVM (Chang & Lin, 2011).
Dataset Splits No The paper mentions the total number of images/samples for Fashion-MNIST (60,000) and CIFAR-10 (50,000), and sample counts for w8a and ijcnn1, but does not explicitly provide the training/validation/test split percentages or sample counts for these datasets in the experimental setup.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications, or cloud instances) used for running the experiments.
Software Dependencies No The paper states 'All the algorithms are implemented and run in Python using the Py Torch package (Paszke et al., 2019)', but does not provide specific version numbers for Python or PyTorch, or any other software dependencies.
Experiment Setup Yes For the latter two algorithms, we use the hyper-parameter settings recommended and widely used in practice (i.e. momentum: 0.9 for SGD-M, and two hyperparameters β1 := 0.9, β2 := 0.999 for Adam). For our new SMG algorithm, we fixed the parameter β := 0.5 since it usually performs the best in our experiments. ... We tune each algorithm using constant learning rate and report the best result obtained.