Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems
Authors: Itay Safran, Ohad Shamir
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically verify the phenomenon that our theory predicts by running simulations on the constructions used in Equations (1) and (2).5 Our lower bounds show that to beat withreplacement on those problem instances, the number of epochs must be in the order of magnitude of the condition number. However, since our analysis is asymptotic in its nature, ignoring constants and logarithmic terms, it is not clear to what extent this phenomenon can be observed in practice. To investigate this, we averaged the performance of 100 SGD instantiations over various values of k ranging from k = 40 up to k = 2, 000, where for each value a suitable step size of η = log(nk) / (λnk) was chosen. Our problem parameters were chosen to satisfy n = 500, G = 1, λ = 1 and λmax = 200. Note that this implies that the condition number equals 200 in the constructed problem instances. Each SGD instantiation was initialized from x0 = G/(2λ), G/(2λmax), which satisfies both Assumptions 1 and 2. We used Python 3.6 in our code, which is freely available at https://github.com/Itay Safran/SGD_condition_number. Our code was run on a single machine with an Intel Core i7-8550U 1.80GHz processor. |
| Researcher Affiliation | Academia | Itay Safran Princeton University isafran@princeton.edu Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il |
| Pseudocode | No | The paper describes algorithms and mathematical formulations but does not include a distinct pseudocode block or algorithm listing. |
| Open Source Code | Yes | We used Python 3.6 in our code, which is freely available at https://github.com/Itay Safran/SGD_condition_number. |
| Open Datasets | No | The paper does not use traditional datasets but rather mathematical constructions for its experiments, based on equations (1) and (2). Thus, there's no externally available dataset to link. |
| Dataset Splits | No | The paper describes the number of SGD instantiations and parameter choices for its simulations but does not specify training, validation, or test data splits, as it uses constructed functions rather than empirical datasets with explicit splits. |
| Hardware Specification | Yes | Our code was run on a single machine with an Intel Core i7-8550U 1.80GHz processor. |
| Software Dependencies | Yes | We used Python 3.6 in our code |
| Experiment Setup | Yes | To investigate this, we averaged the performance of 100 SGD instantiations over various values of k ranging from k = 40 up to k = 2, 000, where for each value a suitable step size of η = log(nk) / (λnk) was chosen. Our problem parameters were chosen to satisfy n = 500, G = 1, λ = 1 and λmax = 200. Each SGD instantiation was initialized from x0 = G/(2λ), G/(2λmax). |