Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Authors: Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham M. Kakade, Boaz Barak
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do not confer any implicit bias advantages in online learning. |
| Researcher Affiliation | Academia | 1SEAS, Harvard University 2Kempner Institute, Harvard University. Correspondence to: Nikhil Vyas <nikhil@g.harvard.edu>, Depen Morwani <dmorwani@g.harvard.edu>. |
| Pseudocode | No | The paper describes experimental procedures and theoretical derivations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Specifically, we run Res Net-18 on CIFAR-5m (Nakkiran et al., 2021), a synthetically generated version of CIFAR-10 with 5 million examples, Conv Next-T on Image Net, and GPT-2-small on C4. |
| Dataset Splits | No | The paper mentions training on subsets of datasets (e.g., 'random subset of 50k samples' for CIFAR-5m offline, '128k examples' for Image Net offline, 'random subset of roughly 100 million tokens' for C4 offline), and evaluates on a test set, but it does not provide specific details for training/validation/test splits (e.g., percentages or exact counts for all splits) needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper mentions using PyTorch, OpenCV, and specific optimizers (SGD, Adam W, Adam) but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For our CIFAR-5m experiments, we trained Res Net-18, on normalized (across channels) images and using the SGD optimizer with 0.9 momentum. For both offline and online learning, we used a learning rate of 0.025. ... For the Image Net experiments, we used Conv Next-T... used a batch size of 2048 and learning rate of 1e-4 with the Adam W optimizer with weight decay 0.005. ... For all experiments we trained GPT-2-small (124m parameters) on the C4 dataset with sequence length 2048. The optimizer we use is Adam without weight decay and a constant learning rate of 6 10 4. |