Efficient Active Learning for Gaussian Process Classification by Error Reduction

Authors: Guang Zhao, Edward Dougherty, Byung-Jun Yoon, Francis Alexander, Xiaoning Qian

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we demonstrate the efficiency of our active learning algorithms combined with either random optimization (NR-MOCU-RO, NR-SMOCU-RO) or Adagrad (NR-SMOCU-SGD) in the following sets of experiments. In the first set of experiments, we analyze and benchmark the running time of our algorithm by comparing to the naive computation of the MOCU/SMOCU reduction. Then we benchmark our algorithms with other active learning algorithms for both query synthesis on synthetic benchmark datasets, and pool-based active learning on real-world datasets.
Researcher Affiliation Academia 1Department of Electrical & Computer Engineering, 2Department of Computer Science & Engineering, Texas A&M University College Station, TX 77843, USA 3Computational Science Initiative, Brookhaven National Laborator Upton, NY 11973, USA
Pseudocode Yes The whole procedure is illustrated as the pseudocode Algorithm 1 in the Appendix. The pseudocode of NR-(S)MOCU-RO can be found in the Appendix (Algorithm 2). The query synthesis algorithm with the integral computation is summarized in the pseudocode: NR-SMOCU with Stochastic Gradient Descent (NR-SMOCU-SGD).
Open Source Code Yes The code for our experiments is made available at https://github.com/QianLab/NR_SMOCU_SGD_GPC.
Open Datasets Yes We also compare algorithms on the UCI datasets [2] for pool-based active learning. ... [2] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
Dataset Splits No For each dataset, we split it into training and testing datasets. We take the training dataset as the sampling pool for active learning, initially we randomly choose two samples from each class for labelling, and use them to estimate the GPC hyperparameters. The paper mentions training and testing sets, but does not provide specific details on a 'validation' split or the exact percentages/counts for any splits.
Hardware Specification Yes The algorithms are implemented in Python 3.7 on a personal computer with Intel i5-10400 2.9 GHz CPU and 16G RAM.
Software Dependencies No The algorithms are implemented in Python 3.7. The paper mentions the programming language and its version, but does not list any other specific software libraries, frameworks, or solvers with their version numbers that are critical for reproducibility.
Experiment Setup Yes MES, BALD,RO-MOCU, RO-SMOCU, ADF-MOCU, NR-MOCU-RO and NR-SMOCU-RO are all optimized by random optimization with M1 = 1000. In NR-SMOCU-SGD, we first perform random optimization with M1 = 800 and set the best point as the initial point for Adagrad so that NR-SMOCU-SGD has similar running time compared with NR-SMOCU-RO at their corresponding setups for fair comparison. ... we initially draw 100 samples for labeling to estimate the hyperparameters {γ, l}.