Small Data, Big Decisions: Model Selection in the Small-Data Regime

Authors: Jorg Bornschein, Francesco Visin, Simon Osindero

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental in this paper we empirically study the generalization performance as the size of the training set varies over multiple orders of magnitude. These systematic experiments lead to some interesting and potentially very useful observations
Researcher Affiliation Industry 1Deep Mind, London, United Kingdom. Correspondence to: J org Bornschein <bornschein@google.com>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to code repositories.
Open Datasets Yes We conduct experiments on the following datasets: MNIST consists of 60k training and 10k test examples from 10 classes (Le Cun, 1998). EMNIST provides 112,800 training and 18,800 test datapoints from 47 classes in its balanced subset (Cohen et al., 2017). CIFAR10 consists of 50k training and 10k test examples from 10 classes (Krizhevsky et al., 2009). Image Net contains 1.28M training and 50k validation examples from 1000 classes (Russakovsky et al., 2015).
Dataset Splits Yes If not mentioned otherwise, we will use a 90%/10% training/calibration split of the available dataset. To assess a model s ability to generalise beyond the available-dataset, we then successively evaluate them on a separate held-out or evaluation dataset.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software components like Adam, Momentum SGD, RMSProp, and ReLU, but it does not specify any version numbers for these or other software dependencies, such as programming languages or libraries.
Experiment Setup Yes Throughout this study we use the following optimizers: Adam [...] with fixed learning rates {10 4, 3 10 4, 10 3} and 50 epochs. Momentum SGD with initial learning rates {10 4, 3 10 4, , 10 1} cosine-decaying over 50 epochs down to 0 (0.9 momentum and ϵ = 10 4). RMSProp + cosine schedule [...] with initial learning rates of {0.03, 0.1, 0.3} and cosine-decaying to 0 over 50 epochs. For all experiments we use a batch size of 256 examples.