AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration on Augmented Synthetic Images

Authors: Prithvijit Chattopadhyay, Bharat Goyal, Boglarka Ecsedi, Viraj Uday Prabhu, Judy Hoffman

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through our experiments, we empirically show the efficacy of AUGCAL across multiple adaptation methods, backbones, tasks and shifts.
Researcher Affiliation Academia Prithvijit Chattopadhyay Bharat Goyal Boglarka Ecsedi Viraj Prabhu Judy Hoffman Georgia Tech {prithvijit3,bharatgoyal,becsedi3,virajp,judy}@gatech.edu
Pseudocode No The paper describes steps for PASTA in Section A.2, but it is presented as a numbered list within a paragraph rather than a formally structured pseudocode or algorithm block. For example: “1. Set α = 3.0, β = 0.25, k = 2 for PASTA 2. Use FFT (Nussbaumer, 1981) to obtain the Fourier spectrum of synthetic image x, as F(x) = FFT(x) CH W”, etc. This format does not qualify as pseudocode.
Open Source Code No Most of our experiments follow the implementations of SIM2REAL adaptation methods from (Hoyer et al., 2022c), with our additional modifications on top. Additionally, in Sec. A.10 of appendix, we provide details surrounding the assets (and corresponding licenses) used for our experiments. This only states they built on existing open-source code, not that their specific contribution (AUGCAL) is open-sourced or provides a link. The provided link (lhoyer/MIC) is for the base method, not for AUGCAL’s specific implementation.
Open Datasets Yes For Sem Seg, we conduct experiments on the GTAV Cityscapes shift. GTAV (Sankaranarayanan et al., 2018) consists of 25k densely annotated SIM ground-view images and Cityscapes (Cordts et al., 2016) consists of 5k REAL ground view images. ... For Obj Rec, we conduct experiments on the Vis DA SIM2REAL benchmark. Vis DA (Peng et al., 2017) consists of 152k SIM images and 55k REAL images across 12 classes.
Dataset Splits Yes Specifically, we use 80% of Vis DA SIM images for training models and rest (20%) for validation and temperature tuning.
Hardware Specification Yes Compute. We conduct all object recognition experiments on RTX 6000 GPUs every experiment requiring a single GPU. For semantic segmentation, we use one A40 GPU per experiment.
Software Dependencies No We use Pytorch (Paszke et al., 2019) as the deep-learning framework for all our experiments. No specific version number for Pytorch or other libraries is provided.
Experiment Setup Yes For Ent Min (Vu et al., 2019) (with DAFormer / Deep Labv2), we use SGD as an optimizer with a learning rate of 2.5 10 4 and use λUDA = 0.001 as the coefficient of the unconstrained entropy loss. For HRDA, we use the multi-resolution self-training strategy from (Hoyer et al., 2022b) Adam W (Loshchilov & Hutter, 2017) as the optimizer with a learning rate of 6 10 5 for the encoder and 6 10 4 for the decoder, with a linear learning rate warmup (warmup iterations 1500; warmup ratio 10 6), followed by polynomial decay (to an eventual learning rate of 0). ... For all Sem Seg settings, we use a batch size of 2 (2 source images, 2 target images) ... We train all segmentation models for 40k iterations... For Obj Rec, ... We use SGD as the base optimizer with a learning rate of 2 10 4, with a batch size of 32 (for both source and target). ... We train Res Net-101 backbones for 30 epochs and Vi T-B/16 backbones for 15 epochs... For MIC, following prior work (Hoyer et al., 2022c), we use a masking patch size of 64, a masking ratio of 0.7, a loss weight of 1 and an EMA factor of 0.999 for the pseudo-label generating teacher.