Fast Neural Network Adaptation via Parameter Remapping and Architecture Search
Authors: Jiemin Fang*, Yuzhu Sun*, Kangjian Peng*, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we conduct FNA on Mobile Net V2 to obtain new networks for both segmentation and detection that clearly out-perform existing networks designed both manually and by NAS. |
| Researcher Affiliation | Collaboration | Jiemin Fang1 , Yuzhu Sun1 , Kangjian Peng2 , Qian Zhang2, Yuan Li2, Wenyu Liu1, Xinggang Wang1 1School of EIC, Huazhong University of Science and Technology 2Horizon Robotics |
| Pseudocode | Yes | Algorithm 1: Weights Remapping Function |
| Open Source Code | Yes | The code is available at https://github.com/Jamin Fong/FNA. |
| Open Datasets | Yes | The semantic segmentation experiments are conducted on the Cityscapes (Cordts et al., 2016) dataset. ... The experiments are conducted on the MS-COCO dataset (Lin et al., 2014b). |
| Dataset Splits | Yes | In the architecture adaptation process, we randomly sample 20% images from the training set as the validation set for architecture parameters updating. ... In the search process of architecture adaptation, we randomly sample 50% data from the original trainval35k set as the validation set. |
| Hardware Specification | Yes | The whole search process is conducted on a single V100 GPU and takes only 1.4 hours in total. ... The whole parameter adaptation process is conducted on 4 TITAN-Xp GPUs and takes 100K iterations, which cost only 8.5 hours in total. ... All our experiments on object detection are conducted on TITAN-Xp GPUs. |
| Software Dependencies | No | The paper mentions software like 'Deep Labv3', 'Retina Net', 'SSDLite', 'MMDetection', 'SGD optimizer', 'Adam optimizer', 'RMSProp optimizer', but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | The batch size is set as 16. We use the SGD optimizer with 0.9 momentum and 5 10 4 weight decay for operation weights and the Adam optimizer (Kingma & Ba, 2015) with 4 10 5 weight decay and a fixed learning rate 1 10 3 for architecture parameters. |