Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning
Authors: Xiong-Hui Chen, Shengyi Jiang, Feng Xu, Zongzhang Zhang, Yang Yu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Mu Jo Co and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap. |
| Researcher Affiliation | Academia | National Key Laboratory of Novel Software Technology Nanjing University, Nanjing 210023, China Pazhou Lab, Guangzhou 510330, China {chenxh, jiangsy, xufeng}@lamda.nju.edu.cn, {zzzhang, yuy}@nju.edu.cn |
| Pseudocode | Yes | The detailed training procedure of CODAS is shown in Alg. 1 in the Appendix. |
| Open Source Code | Yes | Implementation details for all these methods can be found in Appendix D and the source code is available at https://github.com/xionghuichen/codas. |
| Open Datasets | Yes | We evaluate our method in Mu Jo Co [12] from Open AI Gym and Robot Hand Manipulation Tasks. |
| Dataset Splits | No | The paper mentions training and evaluating its method, including using a 'pre-collected dataset' and 'batches of 20 trajectories' for updates, but does not explicitly state specific train/validation/test dataset splits for the mapping function. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Mu Jo Co, PPO, and DAPG, but it does not specify software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | The hyper-parameters of CODAS are λD = 10 and α = 10. The learning rate for the training of qφ and pθ is 1e-4, and for Dω and pϕ is 1e-5. The batch size for training is 20 for all methods. The tasks are trained for 10,000 to 40,000 epochs based on their difficulty. (from Appendix E) |