Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation

Authors: Jannik Kossen, Sebastian Farquhar, Yarin Gal, Thomas Rainforth

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We next study the performance of ASEs for active testing in comparison to relevant baselines. Concretely, we compare to naive MC and the current state-of-the-art LURE-based active testing approach by Kossen et al. [38].
Researcher Affiliation Collaboration Jannik Kossen1 Sebastian Farquhar1,3 Yarin Gal1 Tom Rainforth2 1 OATML, Department of Computer Science, University of Oxford 2 Department of Statistics, University of Oxford 3 Deep Mind
Pseudocode Yes Algorithm 1 Adaptive Refinement of ASEs
Open Source Code Yes We release code for ASEs in the supplement.
Open Datasets Yes Concretely, we are given a fixed model trained on 2000 digits of MNIST [42]... In each case, we train the model to be evaluated, f, on a training set containing 40 000 points, and then use an evaluation pool of size N = 2000.
Dataset Splits No The paper specifies training and test/evaluation set sizes but does not explicitly detail a separate validation set size or splitting methodology for validation data.
Hardware Specification Yes All experiments were run on a local machine with 8 NVIDIA 2080 Ti GPUs and 2 Intel Xeon E5-2630 v4 CPUs, 256GB of RAM.
Software Dependencies Yes Experiments were run using PyTorch 1.10.0 [52] and Python 3.9 [66].
Experiment Setup No The main text states, 'We give full details on the experiments in the appendix. In particular, B and C contain additional results and figures, and D gives further details on the computation of XWED and the baselines.' Specific numerical hyperparameters are thus detailed in the appendix, not the main text.