reproducibilityindex.ai

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

Authors: Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On discrete and continuous control tasks, we illustrate the mechanics of our approach and demonstrate its scalability to high-dimensional tasks.Our experiments will show that DARC outperforms alternative approaches, such as directly applying RL to the target domain or learning importance weights. We will also show that our method can account for domain shift in the termination condition, and conﬁrm the importance of learning two classiﬁers.Figures 11 and 12 show the results of the ablation experiment from Fig. 7 run on all environments.
Researcher Affiliation	Collaboration	Benjamin Eysenbach CMU, Google Brain beysenba@cs.cmu.edu Shreyas Chaudhari CMU shreyaschaudhari@cmu.edu Swapnil Asawa University of Pittsburgh swa12@pitt.edu Sergey Levine UC Berkeley, Google Brain Ruslan Salakhutinov CMU
Pseudocode	Yes	Algorithm 1 Domain Adaptation with Rewards from Classiﬁers [DARC]
Open Source Code	No	The paper mentions that their DARC implementation is built on SAC from Guadarrama et al. (2018) and links to the codebases for MBPO and PETS baselines. However, it does not explicitly state that their own DARC implementation or code used for their experiments is open source or provide a link for it.
Open Datasets	Yes	We use three simulated robots taken from Open AI Gym (Brockman et al., 2016): 7 DOF reacher, half cheetah, and ant.
Dataset Splits	No	No specific training, validation, or test split percentages or sample counts are provided in the paper. The paper mentions training until validation loss increased for the archery experiment but does not specify how validation sets were created or used for other tasks.
Hardware Specification	No	The acknowledgements section mentions "Dr. Paul Munro granting access to compute at CRC" (Center for Research Computing). However, it does not specify any particular hardware models (e.g., GPU, CPU models) or detailed specifications of the computing resources used for experiments.
Software Dependencies	No	The paper mentions building on "SAC from Guadarrama et al. (2018)" and using "Adam (Kingma & Ba, 2014)", which implies TensorFlow for SAC. However, specific version numbers for TensorFlow, SAC, or any other critical libraries/frameworks are not provided.
Experiment Setup	Yes	Our implementation of DARC is built on top of the implementation of SAC from Guadarrama et al. (2018). Unless otherwise speciﬁed, all hyperparameters are taken from Guadarrama et al. (2018). All neural networks (actor, critics, and classiﬁers) have two hidden layers with 256-units each and Re LU activations. Both classiﬁers use Gaussian input noise with σ = 1. Optimization of all networks is done with Adam (Kingma & Ba, 2014) with a learning rate of 3e-4 and batch size of 128. Most experiments with DARC collected 1 step in the target domain every 10 steps in the source domain (i.e., r = 10). We used twarmup = 1e5 for all tasks except the broken reacher, where we used twarmup = 2e5.