Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Authors: Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham M. Kakade, Boaz Barak
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do not confer any implicit bias advantages in online learning. |
| Researcher Affiliation | Academia | 1SEAS, Harvard University 2Kempner Institute, Harvard University. Correspondence to: Nikhil Vyas <EMAIL>, Depen Morwani <EMAIL>. |
| Pseudocode | No | The paper describes experimental procedures and theoretical derivations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Specifically, we run Res Net-18 on CIFAR-5m (Nakkiran et al., 2021), a synthetically generated version of CIFAR-10 with 5 million examples, Conv Next-T on Image Net, and GPT-2-small on C4. |
| Dataset Splits | No | The paper mentions training on subsets of datasets (e.g., 'random subset of 50k samples' for CIFAR-5m offline, '128k examples' for Image Net offline, 'random subset of roughly 100 million tokens' for C4 offline), and evaluates on a test set, but it does not provide specific details for training/validation/test splits (e.g., percentages or exact counts for all splits) needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper mentions using PyTorch, OpenCV, and specific optimizers (SGD, Adam W, Adam) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For our CIFAR-5m experiments, we trained Res Net-18, on normalized (across channels) images and using the SGD optimizer with 0.9 momentum. For both offline and online learning, we used a learning rate of 0.025. ... For the Image Net experiments, we used Conv Next-T... used a batch size of 2048 and learning rate of 1e-4 with the Adam W optimizer with weight decay 0.005. ... For all experiments we trained GPT-2-small (124m parameters) on the C4 dataset with sequence length 2048. The optimizer we use is Adam without weight decay and a constant learning rate of 6 10 4. |