reproducibilityindex.ai

Towards Fast Adaptation of Neural Architectures with Meta Learning

Authors: Dongze Lian, Yin Zheng, Yintao Xu, Yanxiong Lu, Leyu Lin, Peilin Zhao, Junzhou Huang, Shenghua Gao

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that T-NAS achieves state-of-the-art performance in few-shot learning and comparable performance in supervised learning but with 50x less searching cost, which demonstrates the effectiveness of our method.
Researcher Affiliation	Collaboration	1 Shanghai Tech University, 2 Weixin Group, Tencent, 3 Tencent AI Lab, 4 University of Texas at Arlington
Pseudocode	Yes	The complete algorithm of T-NAS is as shown in Alg. 1. Algorithm 1: T-NAS: Transferable Neural Architecture Search
Open Source Code	Yes	Code is available 3. https://github.com/dongzelian/T-NAS
Open Datasets	Yes	Omniglot is a handwritten character recognition dataset proposed in Lake et al. (2011), which contains 1623 characters with 20 samples for each class. We randomly split 1200 characters for training and the remaining for testing, and augment the Omniglot dataset by randomly rotating multiples of 90 degrees following (Santoro et al., 2016). Mini-Imagenet dataset is sampled from the original Image Net (Deng et al., 2009). Fewshot-CIFAR100 (FC100) dataset is proposed in Oreshkin et al. (2018), which is based on a popular image classiﬁcation dataset CIFAR100.
Dataset Splits	Yes	All images are down-sampled to 84 84 pixels and the whole dataset consists of 64 training classes, 16 validation classes and 20 test classes.
Hardware Specification	Yes	All search and evaluation experiments are performed using NVIDIA P40 GPUs.
Software Dependencies	No	The paper mentions optimizers like SGD and Adam (Kingma & Ba, 2014) but does not provide specific software dependencies with version numbers (e.g., PyTorch version, TensorFlow version, or other library versions).
Experiment Setup	Yes	On the Miniimagenet dataset, One {normal + reduction} cell is trained for 10 epochs with 5000 independent tasks for each epoch and the initial channel is set as 16. For the base-searcher, we use the vanilla SGD to optimize the network weights wm i and architecture parameter θm i with inner learning rate αinner = 0.1 and βinner = 30. The inner step M is set as 5 for the trade-off between accuracy and efﬁciency. For the meta-searcher, we use the Adam (Kingma & Ba, 2014) to optimize the metaarchitecture eθ and network weights ew with outer learning rate αouter = 10 3 and βouter = 10 3.