Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems

Authors: Itay Safran, Ohad Shamir

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically verify the phenomenon that our theory predicts by running simulations on the constructions used in Equations (1) and (2).5 Our lower bounds show that to beat withreplacement on those problem instances, the number of epochs must be in the order of magnitude of the condition number. However, since our analysis is asymptotic in its nature, ignoring constants and logarithmic terms, it is not clear to what extent this phenomenon can be observed in practice. To investigate this, we averaged the performance of 100 SGD instantiations over various values of k ranging from k = 40 up to k = 2, 000, where for each value a suitable step size of η = log(nk) / (λnk) was chosen. Our problem parameters were chosen to satisfy n = 500, G = 1, λ = 1 and λmax = 200. Note that this implies that the condition number equals 200 in the constructed problem instances. Each SGD instantiation was initialized from x0 = G/(2λ), G/(2λmax), which satisfies both Assumptions 1 and 2. We used Python 3.6 in our code, which is freely available at https://github.com/Itay Safran/SGD_condition_number. Our code was run on a single machine with an Intel Core i7-8550U 1.80GHz processor.
Researcher Affiliation Academia Itay Safran Princeton University isafran@princeton.edu Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il
Pseudocode No The paper describes algorithms and mathematical formulations but does not include a distinct pseudocode block or algorithm listing.
Open Source Code Yes We used Python 3.6 in our code, which is freely available at https://github.com/Itay Safran/SGD_condition_number.
Open Datasets No The paper does not use traditional datasets but rather mathematical constructions for its experiments, based on equations (1) and (2). Thus, there's no externally available dataset to link.
Dataset Splits No The paper describes the number of SGD instantiations and parameter choices for its simulations but does not specify training, validation, or test data splits, as it uses constructed functions rather than empirical datasets with explicit splits.
Hardware Specification Yes Our code was run on a single machine with an Intel Core i7-8550U 1.80GHz processor.
Software Dependencies Yes We used Python 3.6 in our code
Experiment Setup Yes To investigate this, we averaged the performance of 100 SGD instantiations over various values of k ranging from k = 40 up to k = 2, 000, where for each value a suitable step size of η = log(nk) / (λnk) was chosen. Our problem parameters were chosen to satisfy n = 500, G = 1, λ = 1 and λmax = 200. Each SGD instantiation was initialized from x0 = G/(2λ), G/(2λmax).