Learning Versatile Neural Architectures by Propagating Network Codes
Authors: Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work explores how to design a single neural network capable of adapting to multiple heterogeneous vision tasks, such as image segmentation, 3D detection, and video recognition. This goal is challenging because both network architecture search (NAS) spaces and methods in different tasks are inconsistent. We solve this challenge from both sides. We first introduce a unified design space for multiple tasks and build a multitask NAS benchmark (NAS-Bench-MR) on many widely used datasets, including Image Net, Cityscapes, KITTI, and HMDB51. We further propose Network Coding Propagation (NCP), which back-propagates gradients of neural predictors to directly update architecture codes along the desired gradient directions to solve various tasks. In this way, optimal architecture configurations can be found by NCP in our large search space in seconds. |
| Researcher Affiliation | Collaboration | Mingyu Ding1, Yuqi Huo2, Haoyu Lu2, Linjie Yang3, Zhe Wang4, Zhiwu Lu2, Jingdong Wang5, Ping Luo1 1University of Hong Kong, 2Gaoling School of Artificial Intelligence, Renmin University of China, 3Byte Dance Inc., 4Sense Time Research, 5Baidu |
| Pseudocode | Yes | Algorithm 1 The network propagation process. |
| Open Source Code | Yes | Code is available at github.com/dingmyu/NCP |
| Open Datasets | Yes | We build a multitask NAS benchmark (NAS-Bench-MR) on many widely used datasets, including Image Net, Cityscapes, KITTI, and HMDB51. |
| Dataset Splits | Yes | To train the neural predictor, 2000 and 500 structures in the benchmark are used as the training and validation sets for each task. |
| Hardware Specification | Yes | The initial learning rate is set to 0.1 with a total batch size of 160 on 2 Tesla V100 GPUs for 100 epochs... The initial learning rate is set to 0.1 with a total batch size of 64 on 8 Tesla V100 GPUs for 25000 iterations... We use the one-cycle scheduler with an initial learning rate of 2e-3, a minimum learning rate of 2e-4, and batch size 16 on 8 Tesla V100 GPUs for 80 epochs... The initial learning rate is set to 0.01 with a total batch size of 80 on 4 Tesla V100 GPUs for 100 epochs |
| Software Dependencies | No | The paper mentions using SGD and Adam optimizers, but does not provide specific version numbers for any software libraries, frameworks, or languages used. |
| Experiment Setup | Yes | Unless specified, we use continuous propagation with an initial code of {b, n = 2; c, i, o = 64} and λ = 0.5 for 70 iterations in all experiments. The optimization goal is set to higher performance and lower FLOPs (tacc = pacc + 1, tflops = pflops − 1). |