Fast Neural Network Adaptation via Parameter Remapping and Architecture Search

Authors: Jiemin Fang*, Yuzhu Sun*, Kangjian Peng*, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we conduct FNA on Mobile Net V2 to obtain new networks for both segmentation and detection that clearly out-perform existing networks designed both manually and by NAS.
Researcher Affiliation Collaboration Jiemin Fang1 , Yuzhu Sun1 , Kangjian Peng2 , Qian Zhang2, Yuan Li2, Wenyu Liu1, Xinggang Wang1 1School of EIC, Huazhong University of Science and Technology 2Horizon Robotics
Pseudocode Yes Algorithm 1: Weights Remapping Function
Open Source Code Yes The code is available at https://github.com/Jamin Fong/FNA.
Open Datasets Yes The semantic segmentation experiments are conducted on the Cityscapes (Cordts et al., 2016) dataset. ... The experiments are conducted on the MS-COCO dataset (Lin et al., 2014b).
Dataset Splits Yes In the architecture adaptation process, we randomly sample 20% images from the training set as the validation set for architecture parameters updating. ... In the search process of architecture adaptation, we randomly sample 50% data from the original trainval35k set as the validation set.
Hardware Specification Yes The whole search process is conducted on a single V100 GPU and takes only 1.4 hours in total. ... The whole parameter adaptation process is conducted on 4 TITAN-Xp GPUs and takes 100K iterations, which cost only 8.5 hours in total. ... All our experiments on object detection are conducted on TITAN-Xp GPUs.
Software Dependencies No The paper mentions software like 'Deep Labv3', 'Retina Net', 'SSDLite', 'MMDetection', 'SGD optimizer', 'Adam optimizer', 'RMSProp optimizer', but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes The batch size is set as 16. We use the SGD optimizer with 0.9 momentum and 5 10 4 weight decay for operation weights and the Adam optimizer (Kingma & Ba, 2015) with 4 10 5 weight decay and a fixed learning rate 1 10 3 for architecture parameters.