Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation
Authors: Jannik Kossen, Sebastian Farquhar, Yarin Gal, Thomas Rainforth
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We next study the performance of ASEs for active testing in comparison to relevant baselines. Concretely, we compare to naive MC and the current state-of-the-art LURE-based active testing approach by Kossen et al. [38]. |
| Researcher Affiliation | Collaboration | Jannik Kossen1 Sebastian Farquhar1,3 Yarin Gal1 Tom Rainforth2 1 OATML, Department of Computer Science, University of Oxford 2 Department of Statistics, University of Oxford 3 Deep Mind |
| Pseudocode | Yes | Algorithm 1 Adaptive Refinement of ASEs |
| Open Source Code | Yes | We release code for ASEs in the supplement. |
| Open Datasets | Yes | Concretely, we are given a fixed model trained on 2000 digits of MNIST [42]... In each case, we train the model to be evaluated, f, on a training set containing 40 000 points, and then use an evaluation pool of size N = 2000. |
| Dataset Splits | No | The paper specifies training and test/evaluation set sizes but does not explicitly detail a separate validation set size or splitting methodology for validation data. |
| Hardware Specification | Yes | All experiments were run on a local machine with 8 NVIDIA 2080 Ti GPUs and 2 Intel Xeon E5-2630 v4 CPUs, 256GB of RAM. |
| Software Dependencies | Yes | Experiments were run using PyTorch 1.10.0 [52] and Python 3.9 [66]. |
| Experiment Setup | No | The main text states, 'We give full details on the experiments in the appendix. In particular, B and C contain additional results and figures, and D gives further details on the computation of XWED and the baselines.' Specific numerical hyperparameters are thus detailed in the appendix, not the main text. |