reproducibilityindex.ai

Efficient Active Learning for Gaussian Process Classification by Error Reduction

Authors: Guang Zhao, Edward Dougherty, Byung-Jun Yoon, Francis Alexander, Xiaoning Qian

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we demonstrate the efﬁciency of our active learning algorithms combined with either random optimization (NR-MOCU-RO, NR-SMOCU-RO) or Adagrad (NR-SMOCU-SGD) in the following sets of experiments. In the ﬁrst set of experiments, we analyze and benchmark the running time of our algorithm by comparing to the naive computation of the MOCU/SMOCU reduction. Then we benchmark our algorithms with other active learning algorithms for both query synthesis on synthetic benchmark datasets, and pool-based active learning on real-world datasets.
Researcher Affiliation	Academia	1Department of Electrical & Computer Engineering, 2Department of Computer Science & Engineering, Texas A&M University College Station, TX 77843, USA 3Computational Science Initiative, Brookhaven National Laborator Upton, NY 11973, USA
Pseudocode	Yes	The whole procedure is illustrated as the pseudocode Algorithm 1 in the Appendix. The pseudocode of NR-(S)MOCU-RO can be found in the Appendix (Algorithm 2). The query synthesis algorithm with the integral computation is summarized in the pseudocode: NR-SMOCU with Stochastic Gradient Descent (NR-SMOCU-SGD).
Open Source Code	Yes	The code for our experiments is made available at https://github.com/QianLab/NR_SMOCU_SGD_GPC.
Open Datasets	Yes	We also compare algorithms on the UCI datasets [2] for pool-based active learning. ... [2] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
Dataset Splits	No	For each dataset, we split it into training and testing datasets. We take the training dataset as the sampling pool for active learning, initially we randomly choose two samples from each class for labelling, and use them to estimate the GPC hyperparameters. The paper mentions training and testing sets, but does not provide specific details on a 'validation' split or the exact percentages/counts for any splits.
Hardware Specification	Yes	The algorithms are implemented in Python 3.7 on a personal computer with Intel i5-10400 2.9 GHz CPU and 16G RAM.
Software Dependencies	No	The algorithms are implemented in Python 3.7. The paper mentions the programming language and its version, but does not list any other specific software libraries, frameworks, or solvers with their version numbers that are critical for reproducibility.
Experiment Setup	Yes	MES, BALD,RO-MOCU, RO-SMOCU, ADF-MOCU, NR-MOCU-RO and NR-SMOCU-RO are all optimized by random optimization with M1 = 1000. In NR-SMOCU-SGD, we ﬁrst perform random optimization with M1 = 800 and set the best point as the initial point for Adagrad so that NR-SMOCU-SGD has similar running time compared with NR-SMOCU-RO at their corresponding setups for fair comparison. ... we initially draw 100 samples for labeling to estimate the hyperparameters {γ, l}.