Automated Synthetic-to-Real Generalization

Authors: Wuyang Chen, Zhiding Yu, Zhangyang Wang, Animashree Anandkumar

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that the proposed framework can significantly improve the synthetic-to-real generalization performance without seeing and training on real data, while also benefiting downstream tasks such as domain adaptation. Code is available at: https://github.com/NVlabs/ASG.Experiments also prove the cross-task generalizability of our proxy guidance, which magnifies the strength of synthetic-to-real transfer learning.
Researcher Affiliation Collaboration Wuyang Chen 1 Zhiding Yu 2 Zhangyang Wang 1 Anima Anandkumar 2 3 1Texas A&M University 2NVIDIA 3California Institute of Tech.
Pseudocode Yes Algorithm 1: RL-L2O: policy (π) learning to control group-wise learning rates.
Open Source Code Yes Code is available at: https://github.com/NVlabs/ASG.
Open Datasets Yes Vis DA-17 (Peng et al., 2017) GTA5 (Richter et al., 2016) Cityscapes (Cordts et al., 2016) Image Net (Deng et al., 2009)
Dataset Splits Yes The Vis DA17 dataset provides three subsets (domains), each with the same 12 object categories. Among them, the training set (source domain) is collected from synthetic renderings of 3D models under different angles and lighting conditions, whereas the validation set (target domain) contains real images cropped from the Microsoft COCO dataset (Lin et al., 2014).The model is trained for 30 epochs and λ for LKL is set as 0.1.The model is trained for 50 epochs, and λ for LKL is set as 75.We train π for 50 epochs.
Hardware Specification No The paper mentions "computing power supported by NVIDIA GPU infrastructure" in the acknowledgements, but does not provide specific details such as GPU models (e.g., V100, A100), CPU types, or memory specifications.
Software Dependencies No The paper mentions using "Py Torch official model" but does not specify its version or any other software dependencies (e.g., Python, CUDA, other libraries) with version numbers.
Experiment Setup Yes Image classification: Backbone is pre-trained on Image Net (Deng et al., 2009), and then fine-tuned on source domain, with learning rate = 1 10 4, weight decay = 5 10 4, momentum = 0.9, and batch size = 32. The model is trained for 30 epochs and λ for LKL is set as 0.1.Semantic segmentation: Our learning rate is 1 10 3, weight decay is 5 10 4, momentum is 0.9, and batch size is six. We crop the images into patches of 512 512 and train the model with multi-scale augmentation (0.75 1.25) and horizontal flipping. The model is trained for 50 epochs, and λ for LKL is set as 75.RL-L2O policy: We set the learning rate for policy training as 0.5. The size of the hidden state vector h is set to 20, and the unroll length U = 5. We train π for 50 epochs.