NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search

Authors: Rameswar Panda, Michele Merler, Mayoore S Jaiswal, Hui Wu, Kandan Ramakrishnan, Ulrich Finkler, Chun-Fu Richard Chen, Minsik Cho, Rogerio Feris, David Kung, Bishwaranjan Bhattacharjee9294-9302

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we propose to analyze the architecture transferability of different NAS methods by performing a series of experiments on large scale benchmarks such as Image Net1K and Image Net22K. We believe that our extensive empirical analysis will prove useful for future design of NAS algorithms.
Researcher Affiliation Collaboration Rameswar Panda1,2, Michele Merler1, Mayoore S Jaiswal1, Hui Wu1,2, Kandan Ramakrishnan4, Ulrich Finkler1, Chun-Fu Richard Chen1,2, Minsik Cho1, Rogerio Feris1,2, David Kung1, Bishwaranjan Bhattacharjee1 1IBM Research 2MIT-IBM Watson AI Lab
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that source code for the described methodology is available, nor does it provide a link to a code repository.
Open Datasets Yes We select six diverse and challenging computer vision datasets in image classification, namely MIT67 (Quattoni and Torralba 2009), FLOWERS102 (Nilsback and Zisserman 2008), CIFAR10 and CIFAR100 (Krizhevsky 2009), Image Net1K (Deng et al. 2009) and Image Net22K (Russakovsky et al. 2015) to evaluate the performance of different methods.
Dataset Splits Yes We split each of those datasets into a training, validation and testing subsets with proportions 40/40/20 and use standard data pre-processing and augmentation techniques.
Hardware Specification No The paper mentions 'a single GPU' for searches and 'minimum of 8 to a maximum of 96 GPUs' for augmentation runs, and refers to 'Oak Ridge Leadership Computing Facility (ORNL)' and 'IBM T.J. Watson Research Center Scaling Cluster (WSC)', but does not specify exact GPU/CPU models or detailed hardware configurations.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For augmentations, we use cross entropy loss, SGD optimizer with learning rate 0.025, momentum 0.9, seed 2, initial number of channels 36, and gradient clipping set at 5. The number of cells was fixed to 20 for all experiments and the number of training epochs per dataset was set to 600, 600, 600, 600, 120 and 60 for augment runs on MIT67, FLOWERS102, CIFAR10, CIFAR100, Image Net1K and Image Net22K, respectively.