Towards Fast Adaptation of Neural Architectures with Meta Learning

Authors: Dongze Lian, Yin Zheng, Yintao Xu, Yanxiong Lu, Leyu Lin, Peilin Zhao, Junzhou Huang, Shenghua Gao

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that T-NAS achieves state-of-the-art performance in few-shot learning and comparable performance in supervised learning but with 50x less searching cost, which demonstrates the effectiveness of our method.
Researcher Affiliation Collaboration 1 Shanghai Tech University, 2 Weixin Group, Tencent, 3 Tencent AI Lab, 4 University of Texas at Arlington
Pseudocode Yes The complete algorithm of T-NAS is as shown in Alg. 1. Algorithm 1: T-NAS: Transferable Neural Architecture Search
Open Source Code Yes Code is available 3. https://github.com/dongzelian/T-NAS
Open Datasets Yes Omniglot is a handwritten character recognition dataset proposed in Lake et al. (2011), which contains 1623 characters with 20 samples for each class. We randomly split 1200 characters for training and the remaining for testing, and augment the Omniglot dataset by randomly rotating multiples of 90 degrees following (Santoro et al., 2016). Mini-Imagenet dataset is sampled from the original Image Net (Deng et al., 2009). Fewshot-CIFAR100 (FC100) dataset is proposed in Oreshkin et al. (2018), which is based on a popular image classification dataset CIFAR100.
Dataset Splits Yes All images are down-sampled to 84 84 pixels and the whole dataset consists of 64 training classes, 16 validation classes and 20 test classes.
Hardware Specification Yes All search and evaluation experiments are performed using NVIDIA P40 GPUs.
Software Dependencies No The paper mentions optimizers like SGD and Adam (Kingma & Ba, 2014) but does not provide specific software dependencies with version numbers (e.g., PyTorch version, TensorFlow version, or other library versions).
Experiment Setup Yes On the Miniimagenet dataset, One {normal + reduction} cell is trained for 10 epochs with 5000 independent tasks for each epoch and the initial channel is set as 16. For the base-searcher, we use the vanilla SGD to optimize the network weights wm i and architecture parameter θm i with inner learning rate αinner = 0.1 and βinner = 30. The inner step M is set as 5 for the trade-off between accuracy and efficiency. For the meta-searcher, we use the Adam (Kingma & Ba, 2014) to optimize the metaarchitecture eθ and network weights ew with outer learning rate αouter = 10 3 and βouter = 10 3.