Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning

Authors: Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Mu Jo Co and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.
Researcher Affiliation Academia National Key Laboratory of Novel Software Technology Nanjing University, Nanjing 210023, China Pazhou Lab, Guangzhou 510330, China {chenxh, jiangsy, xufeng}@lamda.nju.edu.cn, {zzzhang, yuy}@nju.edu.cn
Pseudocode Yes The detailed training procedure of CODAS is shown in Alg. 1 in the Appendix.
Open Source Code Yes Implementation details for all these methods can be found in Appendix D and the source code is available at https://github.com/xionghuichen/codas.
Open Datasets Yes We evaluate our method in Mu Jo Co [12] from Open AI Gym and Robot Hand Manipulation Tasks.
Dataset Splits No The paper mentions training and evaluating its method, including using a 'pre-collected dataset' and 'batches of 20 trajectories' for updates, but does not explicitly state specific train/validation/test dataset splits for the mapping function.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using Mu Jo Co, PPO, and DAPG, but it does not specify software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup Yes The hyper-parameters of CODAS are λD = 10 and α = 10. The learning rate for the training of qφ and pθ is 1e-4, and for Dω and pϕ is 1e-5. The batch size for training is 20 for all methods. The tasks are trained for 10,000 to 40,000 epochs based on their difficulty. (from Appendix E)