Active Testing: Sample-Efficient Model Evaluation
Authors: Jannik Kossen, Sebastian Farquhar, Yarin Gal, Tom Rainforth
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its effectiveness on models including Wide Res Nets and Gaussian processes on datasets including Fashion-MNIST and CIFAR-100. |
| Researcher Affiliation | Academia | 1OATML, Department of Computer Science, 2Department of Statistics, Oxford. Correspondence to: Jannik Kossen <jannik.kossen@cs.ox.ac.uk>. |
| Pseudocode | Yes | Algorithm 1 Active Testing Input: Model f trained on data Dtrain |
| Open Source Code | Yes | Full details as well as additional results are provided in the appendix, and we release code for reproducing the results at github.com/jlko/active-testing. |
| Open Datasets | Yes | We demonstrate its effectiveness on models including Wide Res Nets and Gaussian processes on datasets including Fashion-MNIST and CIFAR-100. ... on the MNIST dataset (Le Cun et al., 1998) ... Fashion-MNIST (Xiao et al., 2017) ... CIFAR-10 (Krizhevsky et al., 2009). |
| Dataset Splits | No | The paper mentions using 'training data' and 'test data' extensively but does not specify a separate 'validation' split or its proportions. While standard datasets often have predefined splits, the paper does not explicitly state the validation split used for its experiments. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments (e.g., specific GPU or CPU models). |
| Software Dependencies | No | The paper mentions software like 'Pytorch' (Paszke et al., 2019) and 'scikit-learn' (Pedregosa et al., 2011) but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | The paper describes the models, surrogates, and datasets used, but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations required for direct reproduction. |