Automated Synthetic-to-Real Generalization
Authors: Wuyang Chen, Zhiding Yu, Zhangyang Wang, Animashree Anandkumar
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that the proposed framework can significantly improve the synthetic-to-real generalization performance without seeing and training on real data, while also benefiting downstream tasks such as domain adaptation. Code is available at: https://github.com/NVlabs/ASG.Experiments also prove the cross-task generalizability of our proxy guidance, which magnifies the strength of synthetic-to-real transfer learning. |
| Researcher Affiliation | Collaboration | Wuyang Chen 1 Zhiding Yu 2 Zhangyang Wang 1 Anima Anandkumar 2 3 1Texas A&M University 2NVIDIA 3California Institute of Tech. |
| Pseudocode | Yes | Algorithm 1: RL-L2O: policy (π) learning to control group-wise learning rates. |
| Open Source Code | Yes | Code is available at: https://github.com/NVlabs/ASG. |
| Open Datasets | Yes | Vis DA-17 (Peng et al., 2017) GTA5 (Richter et al., 2016) Cityscapes (Cordts et al., 2016) Image Net (Deng et al., 2009) |
| Dataset Splits | Yes | The Vis DA17 dataset provides three subsets (domains), each with the same 12 object categories. Among them, the training set (source domain) is collected from synthetic renderings of 3D models under different angles and lighting conditions, whereas the validation set (target domain) contains real images cropped from the Microsoft COCO dataset (Lin et al., 2014).The model is trained for 30 epochs and λ for LKL is set as 0.1.The model is trained for 50 epochs, and λ for LKL is set as 75.We train π for 50 epochs. |
| Hardware Specification | No | The paper mentions "computing power supported by NVIDIA GPU infrastructure" in the acknowledgements, but does not provide specific details such as GPU models (e.g., V100, A100), CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using "Py Torch official model" but does not specify its version or any other software dependencies (e.g., Python, CUDA, other libraries) with version numbers. |
| Experiment Setup | Yes | Image classification: Backbone is pre-trained on Image Net (Deng et al., 2009), and then fine-tuned on source domain, with learning rate = 1 10 4, weight decay = 5 10 4, momentum = 0.9, and batch size = 32. The model is trained for 30 epochs and λ for LKL is set as 0.1.Semantic segmentation: Our learning rate is 1 10 3, weight decay is 5 10 4, momentum is 0.9, and batch size is six. We crop the images into patches of 512 512 and train the model with multi-scale augmentation (0.75 1.25) and horizontal flipping. The model is trained for 50 epochs, and λ for LKL is set as 75.RL-L2O policy: We set the learning rate for policy training as 0.5. The size of the hidden state vector h is set to 20, and the unroll length U = 5. We train π for 50 epochs. |