Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Authors: Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On discrete and continuous control tasks, we illustrate the mechanics of our approach and demonstrate its scalability to high-dimensional tasks.Our experiments will show that DARC outperforms alternative approaches, such as directly applying RL to the target domain or learning importance weights. We will also show that our method can account for domain shift in the termination condition, and confirm the importance of learning two classifiers.Figures 11 and 12 show the results of the ablation experiment from Fig. 7 run on all environments. |
| Researcher Affiliation | Collaboration | Benjamin Eysenbach CMU, Google Brain beysenba@cs.cmu.edu Shreyas Chaudhari CMU shreyaschaudhari@cmu.edu Swapnil Asawa University of Pittsburgh swa12@pitt.edu Sergey Levine UC Berkeley, Google Brain Ruslan Salakhutinov CMU |
| Pseudocode | Yes | Algorithm 1 Domain Adaptation with Rewards from Classifiers [DARC] |
| Open Source Code | No | The paper mentions that their DARC implementation is built on SAC from Guadarrama et al. (2018) and links to the codebases for MBPO and PETS baselines. However, it does not explicitly state that their own DARC implementation or code used for their experiments is open source or provide a link for it. |
| Open Datasets | Yes | We use three simulated robots taken from Open AI Gym (Brockman et al., 2016): 7 DOF reacher, half cheetah, and ant. |
| Dataset Splits | No | No specific training, validation, or test split percentages or sample counts are provided in the paper. The paper mentions training until validation loss increased for the archery experiment but does not specify how validation sets were created or used for other tasks. |
| Hardware Specification | No | The acknowledgements section mentions "Dr. Paul Munro granting access to compute at CRC" (Center for Research Computing). However, it does not specify any particular hardware models (e.g., GPU, CPU models) or detailed specifications of the computing resources used for experiments. |
| Software Dependencies | No | The paper mentions building on "SAC from Guadarrama et al. (2018)" and using "Adam (Kingma & Ba, 2014)", which implies TensorFlow for SAC. However, specific version numbers for TensorFlow, SAC, or any other critical libraries/frameworks are not provided. |
| Experiment Setup | Yes | Our implementation of DARC is built on top of the implementation of SAC from Guadarrama et al. (2018). Unless otherwise specified, all hyperparameters are taken from Guadarrama et al. (2018). All neural networks (actor, critics, and classifiers) have two hidden layers with 256-units each and Re LU activations. Both classifiers use Gaussian input noise with σ = 1. Optimization of all networks is done with Adam (Kingma & Ba, 2014) with a learning rate of 3e-4 and batch size of 128. Most experiments with DARC collected 1 step in the target domain every 10 steps in the source domain (i.e., r = 10). We used twarmup = 1e5 for all tasks except the broken reacher, where we used twarmup = 2e5. |