Active Testing: Sample-Efficient Model Evaluation

Authors: Jannik Kossen, Sebastian Farquhar, Yarin Gal, Tom Rainforth

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate its effectiveness on models including Wide Res Nets and Gaussian processes on datasets including Fashion-MNIST and CIFAR-100.
Researcher Affiliation Academia 1OATML, Department of Computer Science, 2Department of Statistics, Oxford. Correspondence to: Jannik Kossen <jannik.kossen@cs.ox.ac.uk>.
Pseudocode Yes Algorithm 1 Active Testing Input: Model f trained on data Dtrain
Open Source Code Yes Full details as well as additional results are provided in the appendix, and we release code for reproducing the results at github.com/jlko/active-testing.
Open Datasets Yes We demonstrate its effectiveness on models including Wide Res Nets and Gaussian processes on datasets including Fashion-MNIST and CIFAR-100. ... on the MNIST dataset (Le Cun et al., 1998) ... Fashion-MNIST (Xiao et al., 2017) ... CIFAR-10 (Krizhevsky et al., 2009).
Dataset Splits No The paper mentions using 'training data' and 'test data' extensively but does not specify a separate 'validation' split or its proportions. While standard datasets often have predefined splits, the paper does not explicitly state the validation split used for its experiments.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments (e.g., specific GPU or CPU models).
Software Dependencies No The paper mentions software like 'Pytorch' (Paszke et al., 2019) and 'scikit-learn' (Pedregosa et al., 2011) but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No The paper describes the models, surrogates, and datasets used, but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations required for direct reproduction.