The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers

Authors: Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Validation: We give evidence that the bootstrap error is small in realistic settings for supervised image classification, by conducting extensive experiments on large-scale tasks (including variants of CIFAR-10 and Image Net) for many architectures (Section 4).
Researcher Affiliation Collaboration Preetum Nakkiran Harvard University preetum@cs.harvard.edu Behnam Neyshabur Blueshift, Alphabet neyshabur@google.com Hanie Sedghi Google Research, Brain team hsedghi@google.com
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured, step-by-step algorithmic descriptions.
Open Source Code Yes CIFAR-5m is a dataset of 6 million synthetic CIFAR-10-like images. We release this dataset publicly on Google Cloud Storage, as described in https://github.com/preetum/cifar5m.
Open Datasets Yes CIFAR-5m is a dataset of 6 million synthetic CIFAR-10-like images. We release this dataset publicly on Google Cloud Storage, as described in https://github.com/preetum/cifar5m.
Dataset Splits No The paper specifies training and test sets but does not explicitly mention or detail a validation set split or how it was used.
Hardware Specification Yes All experiments run on NVIDIA V100 GPUs.
Software Dependencies No The paper lists software used (e.g., Py Torch, Num Py, Hugging Face transformers) but generally does not provide specific version numbers for these dependencies.
Experiment Setup Yes All architectures in the Real World are trained with n = 50K samples from CIFAR-5m, using SGD on the cross-entropy loss, with cosine learning rate decay, for 100 epochs. We use standard CIFAR-10 data augmentation of random crop+horizontal flip. All models use batch size 128... Res Nets and MLP use initial learning rate 0.1 and momentum 0.9. Vi T uses initial LR 0.01, momentum 0.9, and weight decay 1e-4.