reproducibilityindex.ai

Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning

Authors: Jinxin Liu, Hao Shen, Donglin Wang, Yachen Kang, Qiangxing Tian

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also conduct empirical experiments to demonstrate that our method can effectively learn skills that can be smoothly deployed in target. Empirically, we demonstrate that our objective can obtain dynamics-aware rewards, enabling the goal-conditioned policy learned in a source to perform well in the target environment in various settings (stable and unstable settings, and sim2real).
Researcher Affiliation	Academia	Jinxin Liu124 Hao Shen3 Donglin Wang24 Yachen Kang124 Qiangxing Tian124 1 Zhejiang University. 2 Westlake University. 3 UC Berkeley. 4 Institute of Advanced Technology, Westlake Institute for Advanced Study. liujinxin@westlake.edu.cn, haoshen@berkeley.edu, {wangdonglin, kangyachen, tianqiangxing}@westlake.edu.cn
Pseudocode	Yes	Algorithm 1 DARS is presented on page 6 of the paper, detailing the steps of the proposed method.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	No	The paper refers to standard RL environments like Mujoco and Open AI Gym, and custom 'Map' environments, but it does not refer to them as traditional 'datasets' with access information (link, DOI, citation) for public availability or open access in the context of data collection for reproducibility.
Dataset Splits	No	The paper describes training and evaluation within source and target environments (e.g., limited rollouts in target), but it does not specify explicit training/test/validation dataset splits (e.g., percentages or sample counts) typically found in supervised learning setups.
Hardware Specification	No	The paper mentions evaluating on simulated robots and a real quadruped robot, but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the simulations or training the models.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies (e.g., Python, deep learning frameworks like PyTorch or TensorFlow, or simulation environments).
Experiment Setup	Yes	For all tuples, we set β = 10 and the ratio of experience from the source environment vs. the target environment R = 10 (Line 13 in Algorithm 1). See Appendix F.3 for the other hyperparameters.