Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

Authors: Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that BADGE is robust to architecture choice, batch size, and dataset, generally performing as well as or better than the best baseline across our experiments, which vary all of the aforementioned environmental conditions. We begin by introducing our notation and setting, followed by a description of the BADGE algorithm in Section 3 and experiments in Section 4.
Researcher Affiliation Collaboration Jordan T. Ash Princeton University Chicheng Zhang University of Arizona Akshay Krishnamurthy Microsoft Research NYC John Langford Microsoft Research NYC Alekh Agarwal Microsoft Research Redmond
Pseudocode Yes Algorithm 1 BADGE: Batch Active learning by Diverse Gradient Embeddings
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We evaluate our algorithms using three image datasets, SVHN (Netzer et al., 2011), CIFAR10 (Krizhevsky, 2009) and MNIST (Le Cun et al., 1998) 1, and four non-image datasets from the Open ML repository (#6, #155, #156, and #184).
Dataset Splits No The paper mentions training, but does not explicitly provide validation split details. For instance, it says 'M = 100 being the number of initial random labeled examples', but it doesn't detail train/validation/test splits explicitly for the datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies Yes All models are trained in Py Torch (Paszke et al., 2017). Baselines use implementations from the libact library (Yang et al., 2017).
Experiment Setup Yes We use a learning rate of 0.001 for image data and of 0.0001 for non-image data. We avoid warm starting and retrain models from scratch every time new samples are queried (Ash and Adams, 2019).