Small Data, Big Decisions: Model Selection in the Small-Data Regime
Authors: Jorg Bornschein, Francesco Visin, Simon Osindero
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | in this paper we empirically study the generalization performance as the size of the training set varies over multiple orders of magnitude. These systematic experiments lead to some interesting and potentially very useful observations |
| Researcher Affiliation | Industry | 1Deep Mind, London, United Kingdom. Correspondence to: J org Bornschein <bornschein@google.com>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to code repositories. |
| Open Datasets | Yes | We conduct experiments on the following datasets: MNIST consists of 60k training and 10k test examples from 10 classes (Le Cun, 1998). EMNIST provides 112,800 training and 18,800 test datapoints from 47 classes in its balanced subset (Cohen et al., 2017). CIFAR10 consists of 50k training and 10k test examples from 10 classes (Krizhevsky et al., 2009). Image Net contains 1.28M training and 50k validation examples from 1000 classes (Russakovsky et al., 2015). |
| Dataset Splits | Yes | If not mentioned otherwise, we will use a 90%/10% training/calibration split of the available dataset. To assess a model s ability to generalise beyond the available-dataset, we then successively evaluate them on a separate held-out or evaluation dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like Adam, Momentum SGD, RMSProp, and ReLU, but it does not specify any version numbers for these or other software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | Throughout this study we use the following optimizers: Adam [...] with fixed learning rates {10 4, 3 10 4, 10 3} and 50 epochs. Momentum SGD with initial learning rates {10 4, 3 10 4, , 10 1} cosine-decaying over 50 epochs down to 0 (0.9 momentum and ϵ = 10 4). RMSProp + cosine schedule [...] with initial learning rates of {0.03, 0.1, 0.3} and cosine-decaying to 0 over 50 epochs. For all experiments we use a batch size of 256 examples. |