reproducibilityindex.ai

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Authors: Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham M. Kakade, Boaz Barak

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do not confer any implicit bias advantages in online learning.
Researcher Affiliation	Academia	1SEAS, Harvard University 2Kempner Institute, Harvard University. Correspondence to: Nikhil Vyas <nikhil@g.harvard.edu>, Depen Morwani <dmorwani@g.harvard.edu>.
Pseudocode	No	The paper describes experimental procedures and theoretical derivations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Specifically, we run Res Net-18 on CIFAR-5m (Nakkiran et al., 2021), a synthetically generated version of CIFAR-10 with 5 million examples, Conv Next-T on Image Net, and GPT-2-small on C4.
Dataset Splits	No	The paper mentions training on subsets of datasets (e.g., 'random subset of 50k samples' for CIFAR-5m offline, '128k examples' for Image Net offline, 'random subset of roughly 100 million tokens' for C4 offline), and evaluates on a test set, but it does not provide specific details for training/validation/test splits (e.g., percentages or exact counts for all splits) needed for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used for the experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies	No	The paper mentions using PyTorch, OpenCV, and specific optimizers (SGD, Adam W, Adam) but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	For our CIFAR-5m experiments, we trained Res Net-18, on normalized (across channels) images and using the SGD optimizer with 0.9 momentum. For both offline and online learning, we used a learning rate of 0.025. ... For the Image Net experiments, we used Conv Next-T... used a batch size of 2048 and learning rate of 1e-4 with the Adam W optimizer with weight decay 0.005. ... For all experiments we trained GPT-2-small (124m parameters) on the C4 dataset with sequence length 2048. The optimizer we use is Adam without weight decay and a constant learning rate of 6 10 4.