Prompt-based Visual Alignment for Zero-shot Policy Transfer

Authors: Haihan Gao, Rui Zhang, Qi Yi, Hantao Yao, Haochen Li, Jiaming Guo, Shaohui Peng, Yunkai Gao, Qicheng Wang, Xing Hu, Yuanbo Wen, Zihao Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the agent generalizes well on unseen domains under limited access to multidomain data.
Researcher Affiliation Academia 1University of Science and Technology of China 2SKL of Processors, Institute of Computing Technology, CAS 3Institute of Automation, Chinese Academy of Sciences 4Institute of Software, Chinese Academy of Sciences 5University of Chinese Academy of Sciences, China 6Shanghai Innovation Center for Processor Technologies.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described, nor does it include a specific repository link or explicit code release statement.
Open Datasets Yes We verify the proposed PVA with CARLA simulator(Dosovitskiy et al., 2017).
Dataset Splits Yes Clear Noon and Hard Rain Noon are used to tune the prompt and train the visual aligner. We validate the agent s performance on Wet Cloudy Sunset, Clear Sunset, and Soft Rain Sunset, which do not appear in the training stage.
Hardware Specification Yes Prompt tuning took 4 hours and visual alignment took about 8 hours, both were conducted with 4 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions specific models and frameworks like CLIP, UNet, and PPO, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes In the experiment, we sample only 100 images from 2 different domains to extract semantic information and build the visual aligner. The length of global, domain-specific, and instance prompts are 10, 5, and 10. We also applied different temperatures and learning rates to align the learnable prompts with the visual inputs. For general and domain-specific prompts, the temperature and learning rate are 0.5 and 0.0004. For instance-specific prompts, the temperature and learning rate are 0.1 and 0.00005. ... In the experiments, we select λv = 1, λcol = λout = 100