reproducibilityindex.ai

Learning Versatile Neural Architectures by Propagating Network Codes

Authors: Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work explores how to design a single neural network capable of adapting to multiple heterogeneous vision tasks, such as image segmentation, 3D detection, and video recognition. This goal is challenging because both network architecture search (NAS) spaces and methods in different tasks are inconsistent. We solve this challenge from both sides. We ﬁrst introduce a uniﬁed design space for multiple tasks and build a multitask NAS benchmark (NAS-Bench-MR) on many widely used datasets, including Image Net, Cityscapes, KITTI, and HMDB51. We further propose Network Coding Propagation (NCP), which back-propagates gradients of neural predictors to directly update architecture codes along the desired gradient directions to solve various tasks. In this way, optimal architecture conﬁgurations can be found by NCP in our large search space in seconds.
Researcher Affiliation	Collaboration	Mingyu Ding1, Yuqi Huo2, Haoyu Lu2, Linjie Yang3, Zhe Wang4, Zhiwu Lu2, Jingdong Wang5, Ping Luo1 1University of Hong Kong, 2Gaoling School of Artiﬁcial Intelligence, Renmin University of China, 3Byte Dance Inc., 4Sense Time Research, 5Baidu
Pseudocode	Yes	Algorithm 1 The network propagation process.
Open Source Code	Yes	Code is available at github.com/dingmyu/NCP
Open Datasets	Yes	We build a multitask NAS benchmark (NAS-Bench-MR) on many widely used datasets, including Image Net, Cityscapes, KITTI, and HMDB51.
Dataset Splits	Yes	To train the neural predictor, 2000 and 500 structures in the benchmark are used as the training and validation sets for each task.
Hardware Specification	Yes	The initial learning rate is set to 0.1 with a total batch size of 160 on 2 Tesla V100 GPUs for 100 epochs... The initial learning rate is set to 0.1 with a total batch size of 64 on 8 Tesla V100 GPUs for 25000 iterations... We use the one-cycle scheduler with an initial learning rate of 2e-3, a minimum learning rate of 2e-4, and batch size 16 on 8 Tesla V100 GPUs for 80 epochs... The initial learning rate is set to 0.01 with a total batch size of 80 on 4 Tesla V100 GPUs for 100 epochs
Software Dependencies	No	The paper mentions using SGD and Adam optimizers, but does not provide specific version numbers for any software libraries, frameworks, or languages used.
Experiment Setup	Yes	Unless speciﬁed, we use continuous propagation with an initial code of {b, n = 2; c, i, o = 64} and λ = 0.5 for 70 iterations in all experiments. The optimization goal is set to higher performance and lower FLOPs (tacc = pacc + 1, tﬂops = pﬂops − 1).