Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels

Authors: Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a new method for approximating active learning acquisition strategies that are based on retraining with hypothetically-labeled candidate data points. Although this is usually infeasible with deep networks, we use the neural tangent kernel to approximate the result of retraining, and prove that this approximation works asymptotically even in an active learning setup approximating look-ahead selection criteria with far less computation required. This also enables us to conduct sequential active learning, i.e. updating the model in a streaming regime, without needing to retrain the model with SGD after adding each new data point. Moreover, our querying strategy, which better understands how the model s predictions will change by adding new data points in comparison to the standard ( myopic ) criteria, beats other look-ahead strategies by large margins, and achieves equal or better performance compared to state-of-the-art methods on several benchmark datasets in pool-based active learning.
Researcher Affiliation Academia Mohamad Amin Mohamadi University of British Columbia lemohama@cs.ubc.ca Wonho Bae University of British Columbia whbae@cs.ubc.ca Danica J. Sutherland University of British Columbia Alberta Machine Intelligence Institute dsuth@cs.ubc.ca
Pseudocode Yes Algorithm 1: Active learning using NTKs
Open Source Code Yes We provide anonymous code for the full pipeline in the supplementary material.
Open Datasets Yes Datasets. To demonstrate the effectiveness of the proposed method, we provide experimental results on three benchmark datasets for classification tasks: MNIST [8], SVHN [9], CIFAR10 [10], and CIFAR100 [10].
Dataset Splits Yes MNIST consists of 10 hand-written digits with 60, 000 training and 10, 000 test images, at size 28 28. SVHN also consists of 10 digit numbers with 73, 257 training and 26, 032 test images, at size 32 32; CIFAR10 contains 50, 000 training and 10, 000 test images, also 32 32 and equally split between 10 classes like airplane, frog, and truck.
Hardware Specification Yes The line plots represent accuracy whereas bar plots represent wall-clock time to query data, using a NVIDIA V100 GPU.
Software Dependencies No We implement a pipeline for the proposed method using Py Torch [41] and Jax [42]; our neural networks f are implemented in Py Torch, whereas the linearized models f lin are implemented in Jax with the neural-tangents library [43]. The paper mentions software names but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We employ a Res Net18 [44] and Wide Res Net [45] with one or two layers and maximum width of 640... As mentioned earlier, we use L2 loss... For the naïve version, we retrain the neural network for 15 epochs using SGD. ...At each cycle, we query 20, 1 000, 1 000, and 1 000 new data points on MNIST, SVHN, CIFAR10, and CIFAR100 from the subset of 4 000, 6 000, 6 000, and 6 000 unlabeled data points, and initialize the labeled set L with 100, 1 000, 1 000, and 10 000 data points.