Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning

Authors: Jinxin Liu, Hao Shen, Donglin Wang, Yachen Kang, Qiangxing Tian

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct empirical experiments to demonstrate that our method can effectively learn skills that can be smoothly deployed in target. Empirically, we demonstrate that our objective can obtain dynamics-aware rewards, enabling the goal-conditioned policy learned in a source to perform well in the target environment in various settings (stable and unstable settings, and sim2real).
Researcher Affiliation Academia Jinxin Liu124 Hao Shen3 Donglin Wang24 Yachen Kang124 Qiangxing Tian124 1 Zhejiang University. 2 Westlake University. 3 UC Berkeley. 4 Institute of Advanced Technology, Westlake Institute for Advanced Study. liujinxin@westlake.edu.cn, haoshen@berkeley.edu, {wangdonglin, kangyachen, tianqiangxing}@westlake.edu.cn
Pseudocode Yes Algorithm 1 DARS is presented on page 6 of the paper, detailing the steps of the proposed method.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets No The paper refers to standard RL environments like Mujoco and Open AI Gym, and custom 'Map' environments, but it does not refer to them as traditional 'datasets' with access information (link, DOI, citation) for public availability or open access in the context of data collection for reproducibility.
Dataset Splits No The paper describes training and evaluation within source and target environments (e.g., limited rollouts in target), but it does not specify explicit training/test/validation dataset splits (e.g., percentages or sample counts) typically found in supervised learning setups.
Hardware Specification No The paper mentions evaluating on simulated robots and a real quadruped robot, but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the simulations or training the models.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies (e.g., Python, deep learning frameworks like PyTorch or TensorFlow, or simulation environments).
Experiment Setup Yes For all tuples, we set β = 10 and the ratio of experience from the source environment vs. the target environment R = 10 (Line 13 in Algorithm 1). See Appendix F.3 for the other hyperparameters.