Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning
Authors: Jinxin Liu, Hao Shen, Donglin Wang, Yachen Kang, Qiangxing Tian
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct empirical experiments to demonstrate that our method can effectively learn skills that can be smoothly deployed in target. Empirically, we demonstrate that our objective can obtain dynamics-aware rewards, enabling the goal-conditioned policy learned in a source to perform well in the target environment in various settings (stable and unstable settings, and sim2real). |
| Researcher Affiliation | Academia | Jinxin Liu124 Hao Shen3 Donglin Wang24 Yachen Kang124 Qiangxing Tian124 1 Zhejiang University. 2 Westlake University. 3 UC Berkeley. 4 Institute of Advanced Technology, Westlake Institute for Advanced Study. liujinxin@westlake.edu.cn, haoshen@berkeley.edu, {wangdonglin, kangyachen, tianqiangxing}@westlake.edu.cn |
| Pseudocode | Yes | Algorithm 1 DARS is presented on page 6 of the paper, detailing the steps of the proposed method. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper refers to standard RL environments like Mujoco and Open AI Gym, and custom 'Map' environments, but it does not refer to them as traditional 'datasets' with access information (link, DOI, citation) for public availability or open access in the context of data collection for reproducibility. |
| Dataset Splits | No | The paper describes training and evaluation within source and target environments (e.g., limited rollouts in target), but it does not specify explicit training/test/validation dataset splits (e.g., percentages or sample counts) typically found in supervised learning setups. |
| Hardware Specification | No | The paper mentions evaluating on simulated robots and a real quadruped robot, but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the simulations or training the models. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies (e.g., Python, deep learning frameworks like PyTorch or TensorFlow, or simulation environments). |
| Experiment Setup | Yes | For all tuples, we set β = 10 and the ratio of experience from the source environment vs. the target environment R = 10 (Line 13 in Algorithm 1). See Appendix F.3 for the other hyperparameters. |