reproducibilityindex.ai

Deep Active Learning with a Neural Architecture Search

Authors: Yonatan Geifman, Ran El-Yaniv

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our strategy using three known querying techniques (softmax response, MC-dropout, and coresets) and show that the proposed approach overwhelmingly outperforms active learning using ﬁxed architectures. We demonstrate this advantage of active-i NAS with the above three querying functions over three image classiﬁcation datasets: CIFAR-10, CIFAR-100, and SVHN. In Figure 2(a) we see the results obtained by active-i NAS and two ﬁxed architectures for classifying CIFAR-10 images using the softmax response querying function.
Researcher Affiliation	Academia	Yonatan Geifman Technion Israel Institute of Technology yonatan.g@cs.technion.ac.il Ran El-Yaniv Technion Israel Institute of Technology rani@cs.technion.ac.il
Pseudocode	Yes	Algorithm 1 i NAS, Algorithm 2 Deep Active Learning with i NAS
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	We demonstrate this advantage of active-i NAS with the above three querying functions over three image classiﬁcation datasets: CIFAR-10, CIFAR-100, and SVHN.
Dataset Splits	Yes	Let S , V be an train-test random split of S. The initial block contains a convolutional layer with ﬁlter size of 3 3 and depth of 64, followed by a max-pooling layer having a spatial size of 3 3 and strides of 2. The active learning was implemented with an initial labeled training seed (k) of 2000 instances. The active mini-batch size (b) was initialized to 2000 instances and updated to 5000 after reaching 10000 labeled instances. The maximal budget was set to 50,000 for all datasets.
Hardware Specification	Yes	For example, in the CIFAR-10 experiment TSGD = 200 requires less than 2 GPU hours (on average) for an active learning round (Nvidia Titan-Xp GPU).
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	We trained all models using stochastic gradient descent (SGD) with a batch size of 128 and momentum of 0.9 for 200 epochs. We used a learning rate of 0.1, with a learning rate multiplicative decay of 0.1 after epochs 100 and 150. We ﬁxed the size of an epoch to be 50,000 instances (by oversampling), regardless of the current size of the training set St. A weight decay of 5e-4 was used, and standard data augmentation was applied containing horizontal ﬂips, four pixel shifts and up to 15-degree rotations. The active learning was implemented with an initial labeled training seed (k) of 2000 instances. The active mini-batch size (b) was initialized to 2000 instances and updated to 5000 after reaching 10000 labeled instances. The maximal budget was set to 50,000 for all datasets. For time efﬁciency reasons, the i NAS algorithm was implemented with Ti NAS = 1, and the training of new architectures in i NAS was early-stopped after 50 epochs, similar to what was done in [22].