Batch Active Learning at Scale

Authors: Gui Citovsky, Giulia DeSalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct large scale experiments using a Res Net-101 model applied to multi-label Open Images Dataset consisting of almost 10M images and 60M labels over 20K classes, to demonstrate significant improvement Cluster-Margin provides over the baselines. In the best result, we find that Cluster-Margin requires only 40% of the labels needed by the next best method to achieve the same target performance. To compare against latest published results, we follow their experimental settings and conduct smaller scale experiments using a VGG16 model on multiclass CIFAR10, CIFAR100, and SVHN datasets, and show Cluster-Margin algorithm s competitive performance.
Researcher Affiliation Industry Gui Citovsky, Giulia De Salvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar Google Research {gcitovsky,giuliad,cgentile,lkary,anandbr,rostami,sanjivk}@google.com
Pseudocode Yes Algorithm 1 Hierarchical Agglomerative Clustering (HAC) with Average-Linkage. Algorithm 2 The Cluster-Margin Algorithm.
Open Source Code No The paper does not contain an explicit statement about the release of its source code or a link to a code repository for the methodology described.
Open Datasets Yes We leverage the Open Images v6 image classification dataset [Krasin et al., 2017] to evaluate Cluster Margin and other active learning methods in the very large batch-size setting, i.e. batch-sizes of 100K and 1M. Specifically, we consider CIFAR10, CIFAR100, and SVHN, which are datasets that contain 32-by-32 color images [Krizhevsky, 2009, Netzer et al., 2011].
Dataset Splits Yes Table 1: Open Images Dataset v6 statistics by data split. Images Positives Negatives Train 9,011,219 19,856,086 37,668,266 Validation 41,620 367,263 228,076 Test 125,436 1,110,124 689,759
Hardware Specification Yes We train a Res Net-101 model implemented using tf-slim with batch SGD using 64 Cloud TPU v4 s each with two cores.
Software Dependencies No The paper mentions using 'tf-slim' and 'tf-keras library' but does not specify version numbers for these or any other software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes We train a Res Net-101 model implemented using tf-slim with batch SGD using 64 Cloud TPU v4 s each with two cores. Each core is fed 48 examples per SGD iteration, resulting in an effective SGD batch of size 64 * 2 * 48 = 6144. The SGD optimizer decays the learning rate logarithmically after every 5 * 10^8 examples and uses an initial learning rate of 10^-4. We use batch SGD with learning rate fixed to 0.001 and SGD s batch size set to 100.