Gone Fishing: Neural Active Learning with Fisher Embeddings
Authors: Jordan Ash, Surbhi Goel, Akshay Krishnamurthy, Sham Kakade
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that BAIT outperforms the previous state of the art on both classification and regression problems, and is flexible enough to be used with a variety of model architectures. |
| Researcher Affiliation | Collaboration | Jordan T. Ash Microsoft Research NYC ash.jordan@microsoft.com Surbhi Goel Microsoft Research NYC goel.surbhi@microsoft.com Akshay Krishnamurthy Microsoft Research NYC akshaykr@microsoft.com Sham Kakade Microsoft Research NYC University of Washington sham.kakade@microsoft.com |
| Pseudocode | Yes | Algorithm 1 BAIT Require: Neural network f(x; ), unlabeled pool of examples U, initial number of examples B0, number of iterations T, number of examples in a batch B. 1: Initialize S by drawing B0 labeled points from U & fit model on S: 1 = argmin ES[ (x, y; )] 2: for t = 1, 2, . . . , T: {forward greedy optimization} do 3: Compute I( L t ) = 1 |U| P x2U I(x; L t ) 4: Initialize M0 = λI + 1 |S| P x2S I(x; L t ) 5: for i = 1, 2, . . . , 2B: do 6: x = argminx2U tr((Mi + I(x; L t )) 7: Mi+1 Mi + I( x; L t ), S x 8: end for 9: for i = 2B, 2B 1, ..., B: {backward greedy optimization} do 10: x = argminx2S tr((Mi I(x; L t )) 11: Mi 1 Mi I( x; L t ), S S \ x 12: end for 13: Train model on S: t = argmin ES[ (x, y; )]. 14: end for 15: return Final model T +1. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We consider three datasets. Using an MLP, we perform active learning on both MNIST data and Open ML dataset 155. We also use the SVHN dataset [37] of color digit images with both an MLP and an 18-layer Res Net. Last we explore the CIFAR-10 object dataset [38] with a Res Net. |
| Dataset Splits | Yes | Each learner is initialized with 100 randomly sampled labeled points, and each experiment is repeated five times with different random seeds. Shadowed regions in plots denote standard error. More empirical details can be found in Appendix Section C. |
| Hardware Specification | No | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] In appendix 4. The appendix (which is not provided in the input text) would contain the specific details. |
| Software Dependencies | No | The paper mentions training with 'Adam variant of SGD' but does not specify version numbers for any software or libraries. |
| Experiment Setup | Yes | All Res Nets are trained with a learning rate of 0.01, and all other models (including linear models shown earlier) are trained with a learning rate of 0.0001. We fit parameters using the Adam variant of SGD, and use standard data augmentation for all CIFAR-10 experiments. Like other deep active learning work, we avoid warm-starting and retrain model parameters from a random initialization after each query round [5]. Each learner is initialized with 100 randomly sampled labeled points, and each experiment is repeated five times with different random seeds. Shadowed regions in plots denote standard error. More empirical details can be found in Appendix Section C. |