AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Authors: Biwei Huang, Fan Feng, Chaochao Lu, Sara Magliacane, Kun Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the efficacy of Ada RL through a series of experiments that vary factors in the observation, transition and reward functions for Cartpole and Atari games 1. As shown in Tables 1, 2 and 3 2, Ada RL consistently outperforms the baselines across most change factors in the MDP and POMDP case for modified Cartpole, and in the POMDP case for Pong for Ntarget = 50.
Researcher Affiliation Collaboration Biwei Huang Carnegie Mellon University biweih@andrew.cmu.edu Fan Feng City University of Hong Kong ffeng1017@gmail.com Chaochao Lu University of Cambridge & Max Planck Institute for Intelligent Systems cl641@cam.ac.uk Sara Magliacane University of Amsterdam & MIT-IBM Watson AI Lab sara.magliacane@gmail.com Kun Zhang Carnegie Mellon University & Mohamed bin Zayed University of Artificial Intelligence kunz1@cmu.edu
Pseudocode Yes We provide the pseudocode for the Ada RL algorithm in Alg. 1. Algorithm A1 Pseudo code of Miss-VAE.
Open Source Code Yes Code link: https://github.com/Adaptive-RL/Ada RL-code. Source code is given at https://github.com/Adaptive-RL/Ada RL-code, providing also a complete description of our experimental environment, configuration files and instructions on the reproduction of our experiments.
Open Datasets Yes We modify the Cartpole and Atari Pong environments in Open AI Gym (Brockman et al., 2016). All datasets used are publicly available or instructions are provided on how to generate them in Appendix 5 and 6.
Dataset Splits No The paper uses source and target domains for training and evaluation, respectively, and discusses sample sizes (Ntarget) for the target domain, but does not specify conventional dataset training/validation/test splits.
Hardware Specification Yes For the model estimation, Cartpole and Pong experiments are implemented on 1 NVIDIA P100 GPUs and 4 NVidia V100 GPUs, respectively. The policy learning stages in both experiments are implemented on 8 Nvidia RTX 1080Ti GPUs.
Software Dependencies No In our code, we have used the following libraries which are covered by the corresponding licenses: Tensorflow (Apache License 2.0), Pytorch (BSD 3-Clause "New" or "Revised" License), Open AI Gym (MIT License), Open CV (Apache 2 License), Numpy (BSD 3-Clause "New" or "Revised" License) Keras (Apache License).
Experiment Setup Yes We use a random policy to collect sequence data from source domains. For both modified Cartpole and Pong experiments, the sequence length is 40 and the number of sequence is 10000 for each domain. The sampling resolution is set to be 0.02. Other details are summarized in Table A16. Table A16: Experimental details on the model estimation part.