Prompt-based Visual Alignment for Zero-shot Policy Transfer
Authors: Haihan Gao, Rui Zhang, Qi Yi, Hantao Yao, Haochen Li, Jiaming Guo, Shaohui Peng, Yunkai Gao, Qicheng Wang, Xing Hu, Yuanbo Wen, Zihao Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that the agent generalizes well on unseen domains under limited access to multidomain data. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China 2SKL of Processors, Institute of Computing Technology, CAS 3Institute of Automation, Chinese Academy of Sciences 4Institute of Software, Chinese Academy of Sciences 5University of Chinese Academy of Sciences, China 6Shanghai Innovation Center for Processor Technologies. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it include a specific repository link or explicit code release statement. |
| Open Datasets | Yes | We verify the proposed PVA with CARLA simulator(Dosovitskiy et al., 2017). |
| Dataset Splits | Yes | Clear Noon and Hard Rain Noon are used to tune the prompt and train the visual aligner. We validate the agent s performance on Wet Cloudy Sunset, Clear Sunset, and Soft Rain Sunset, which do not appear in the training stage. |
| Hardware Specification | Yes | Prompt tuning took 4 hours and visual alignment took about 8 hours, both were conducted with 4 NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions specific models and frameworks like CLIP, UNet, and PPO, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In the experiment, we sample only 100 images from 2 different domains to extract semantic information and build the visual aligner. The length of global, domain-specific, and instance prompts are 10, 5, and 10. We also applied different temperatures and learning rates to align the learnable prompts with the visual inputs. For general and domain-specific prompts, the temperature and learning rate are 0.5 and 0.0004. For instance-specific prompts, the temperature and learning rate are 0.1 and 0.00005. ... In the experiments, we select λv = 1, λcol = λout = 100 |