The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers
Authors: Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Validation: We give evidence that the bootstrap error is small in realistic settings for supervised image classification, by conducting extensive experiments on large-scale tasks (including variants of CIFAR-10 and Image Net) for many architectures (Section 4). |
| Researcher Affiliation | Collaboration | Preetum Nakkiran Harvard University preetum@cs.harvard.edu Behnam Neyshabur Blueshift, Alphabet neyshabur@google.com Hanie Sedghi Google Research, Brain team hsedghi@google.com |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured, step-by-step algorithmic descriptions. |
| Open Source Code | Yes | CIFAR-5m is a dataset of 6 million synthetic CIFAR-10-like images. We release this dataset publicly on Google Cloud Storage, as described in https://github.com/preetum/cifar5m. |
| Open Datasets | Yes | CIFAR-5m is a dataset of 6 million synthetic CIFAR-10-like images. We release this dataset publicly on Google Cloud Storage, as described in https://github.com/preetum/cifar5m. |
| Dataset Splits | No | The paper specifies training and test sets but does not explicitly mention or detail a validation set split or how it was used. |
| Hardware Specification | Yes | All experiments run on NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper lists software used (e.g., Py Torch, Num Py, Hugging Face transformers) but generally does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | All architectures in the Real World are trained with n = 50K samples from CIFAR-5m, using SGD on the cross-entropy loss, with cosine learning rate decay, for 100 epochs. We use standard CIFAR-10 data augmentation of random crop+horizontal flip. All models use batch size 128... Res Nets and MLP use initial learning rate 0.1 and momentum 0.9. Vi T uses initial LR 0.01, momentum 0.9, and weight decay 1e-4. |