Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

Authors: Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On discrete and continuous control tasks, we illustrate the mechanics of our approach and demonstrate its scalability to high-dimensional tasks.Our experiments will show that DARC outperforms alternative approaches, such as directly applying RL to the target domain or learning importance weights. We will also show that our method can account for domain shift in the termination condition, and confirm the importance of learning two classifiers.Figures 11 and 12 show the results of the ablation experiment from Fig. 7 run on all environments.
Researcher Affiliation Collaboration Benjamin Eysenbach CMU, Google Brain beysenba@cs.cmu.edu Shreyas Chaudhari CMU shreyaschaudhari@cmu.edu Swapnil Asawa University of Pittsburgh swa12@pitt.edu Sergey Levine UC Berkeley, Google Brain Ruslan Salakhutinov CMU
Pseudocode Yes Algorithm 1 Domain Adaptation with Rewards from Classifiers [DARC]
Open Source Code No The paper mentions that their DARC implementation is built on SAC from Guadarrama et al. (2018) and links to the codebases for MBPO and PETS baselines. However, it does not explicitly state that their own DARC implementation or code used for their experiments is open source or provide a link for it.
Open Datasets Yes We use three simulated robots taken from Open AI Gym (Brockman et al., 2016): 7 DOF reacher, half cheetah, and ant.
Dataset Splits No No specific training, validation, or test split percentages or sample counts are provided in the paper. The paper mentions training until validation loss increased for the archery experiment but does not specify how validation sets were created or used for other tasks.
Hardware Specification No The acknowledgements section mentions "Dr. Paul Munro granting access to compute at CRC" (Center for Research Computing). However, it does not specify any particular hardware models (e.g., GPU, CPU models) or detailed specifications of the computing resources used for experiments.
Software Dependencies No The paper mentions building on "SAC from Guadarrama et al. (2018)" and using "Adam (Kingma & Ba, 2014)", which implies TensorFlow for SAC. However, specific version numbers for TensorFlow, SAC, or any other critical libraries/frameworks are not provided.
Experiment Setup Yes Our implementation of DARC is built on top of the implementation of SAC from Guadarrama et al. (2018). Unless otherwise specified, all hyperparameters are taken from Guadarrama et al. (2018). All neural networks (actor, critics, and classifiers) have two hidden layers with 256-units each and Re LU activations. Both classifiers use Gaussian input noise with σ = 1. Optimization of all networks is done with Adam (Kingma & Ba, 2014) with a learning rate of 3e-4 and batch size of 128. Most experiments with DARC collected 1 step in the target domain every 10 steps in the source domain (i.e., r = 10). We used twarmup = 1e5 for all tasks except the broken reacher, where we used twarmup = 2e5.