Deep Active Learning with a Neural Architecture Search
Authors: Yonatan Geifman, Ran El-Yaniv
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our strategy using three known querying techniques (softmax response, MC-dropout, and coresets) and show that the proposed approach overwhelmingly outperforms active learning using fixed architectures. We demonstrate this advantage of active-i NAS with the above three querying functions over three image classification datasets: CIFAR-10, CIFAR-100, and SVHN. In Figure 2(a) we see the results obtained by active-i NAS and two fixed architectures for classifying CIFAR-10 images using the softmax response querying function. |
| Researcher Affiliation | Academia | Yonatan Geifman Technion Israel Institute of Technology yonatan.g@cs.technion.ac.il Ran El-Yaniv Technion Israel Institute of Technology rani@cs.technion.ac.il |
| Pseudocode | Yes | Algorithm 1 i NAS, Algorithm 2 Deep Active Learning with i NAS |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We demonstrate this advantage of active-i NAS with the above three querying functions over three image classification datasets: CIFAR-10, CIFAR-100, and SVHN. |
| Dataset Splits | Yes | Let S , V be an train-test random split of S. The initial block contains a convolutional layer with filter size of 3 3 and depth of 64, followed by a max-pooling layer having a spatial size of 3 3 and strides of 2. The active learning was implemented with an initial labeled training seed (k) of 2000 instances. The active mini-batch size (b) was initialized to 2000 instances and updated to 5000 after reaching 10000 labeled instances. The maximal budget was set to 50,000 for all datasets. |
| Hardware Specification | Yes | For example, in the CIFAR-10 experiment TSGD = 200 requires less than 2 GPU hours (on average) for an active learning round (Nvidia Titan-Xp GPU). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | We trained all models using stochastic gradient descent (SGD) with a batch size of 128 and momentum of 0.9 for 200 epochs. We used a learning rate of 0.1, with a learning rate multiplicative decay of 0.1 after epochs 100 and 150. We fixed the size of an epoch to be 50,000 instances (by oversampling), regardless of the current size of the training set St. A weight decay of 5e-4 was used, and standard data augmentation was applied containing horizontal flips, four pixel shifts and up to 15-degree rotations. The active learning was implemented with an initial labeled training seed (k) of 2000 instances. The active mini-batch size (b) was initialized to 2000 instances and updated to 5000 after reaching 10000 labeled instances. The maximal budget was set to 50,000 for all datasets. For time efficiency reasons, the i NAS algorithm was implemented with Ti NAS = 1, and the training of new architectures in i NAS was early-stopped after 50 epochs, similar to what was done in [22]. |