Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels
Authors: Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a new method for approximating active learning acquisition strategies that are based on retraining with hypothetically-labeled candidate data points. Although this is usually infeasible with deep networks, we use the neural tangent kernel to approximate the result of retraining, and prove that this approximation works asymptotically even in an active learning setup approximating look-ahead selection criteria with far less computation required. This also enables us to conduct sequential active learning, i.e. updating the model in a streaming regime, without needing to retrain the model with SGD after adding each new data point. Moreover, our querying strategy, which better understands how the model s predictions will change by adding new data points in comparison to the standard ( myopic ) criteria, beats other look-ahead strategies by large margins, and achieves equal or better performance compared to state-of-the-art methods on several benchmark datasets in pool-based active learning. |
| Researcher Affiliation | Academia | Mohamad Amin Mohamadi University of British Columbia lemohama@cs.ubc.ca Wonho Bae University of British Columbia whbae@cs.ubc.ca Danica J. Sutherland University of British Columbia Alberta Machine Intelligence Institute dsuth@cs.ubc.ca |
| Pseudocode | Yes | Algorithm 1: Active learning using NTKs |
| Open Source Code | Yes | We provide anonymous code for the full pipeline in the supplementary material. |
| Open Datasets | Yes | Datasets. To demonstrate the effectiveness of the proposed method, we provide experimental results on three benchmark datasets for classification tasks: MNIST [8], SVHN [9], CIFAR10 [10], and CIFAR100 [10]. |
| Dataset Splits | Yes | MNIST consists of 10 hand-written digits with 60, 000 training and 10, 000 test images, at size 28 28. SVHN also consists of 10 digit numbers with 73, 257 training and 26, 032 test images, at size 32 32; CIFAR10 contains 50, 000 training and 10, 000 test images, also 32 32 and equally split between 10 classes like airplane, frog, and truck. |
| Hardware Specification | Yes | The line plots represent accuracy whereas bar plots represent wall-clock time to query data, using a NVIDIA V100 GPU. |
| Software Dependencies | No | We implement a pipeline for the proposed method using Py Torch [41] and Jax [42]; our neural networks f are implemented in Py Torch, whereas the linearized models f lin are implemented in Jax with the neural-tangents library [43]. The paper mentions software names but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We employ a Res Net18 [44] and Wide Res Net [45] with one or two layers and maximum width of 640... As mentioned earlier, we use L2 loss... For the naïve version, we retrain the neural network for 15 epochs using SGD. ...At each cycle, we query 20, 1 000, 1 000, and 1 000 new data points on MNIST, SVHN, CIFAR10, and CIFAR100 from the subset of 4 000, 6 000, 6 000, and 6 000 unlabeled data points, and initialize the labeled set L with 100, 1 000, 1 000, and 10 000 data points. |