Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Permutation-Based SGD: Is Random Optimal?
Authors: Shashank Rajput, Kangwook Lee, Dimitris Papailiopoulos
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We summarize FLIPFLOP s convergence rates in Table 1 and report the results of numerical verification in Section 6.2. |
| Researcher Affiliation | Academia | Shashank Rajput Kangwook Lee University of Wisconsin-Madison Dimitris Papailiopoulos |
| Pseudocode | Yes | Algorithm 1 Permutation-based SGD variants |
| Open Source Code | Yes | The code for all the experiments can be found at https://github.com/shashankrajput/flipflop . |
| Open Datasets | No | We randomly sample n = 800 points from a 100-dimensional sphere. Let the points be xi for i = 1, . . . , n. Then, their mean is the solution to the following quadratic problem : arg minx F(x) = 1/n sum_{i=1 to n} ||x - xi||^2. We solve this problem by using the given algorithms. |
| Dataset Splits | No | The paper does not provide specific details about training, validation, or test dataset splits. The experiments involve optimizing functions rather than typical supervised learning tasks with predefined data splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We set n = 800, so that n << K and hence the higher order terms of K dominate the convergence rates. Note that both the axes are in logarithmic scale. [...] with step size α = 10 log(n K) / µn K (Theorem 4). |