reproducibilityindex.ai

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Authors: Biwei Huang, Fan Feng, Chaochao Lu, Sara Magliacane, Kun Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the efﬁcacy of Ada RL through a series of experiments that vary factors in the observation, transition and reward functions for Cartpole and Atari games 1. As shown in Tables 1, 2 and 3 2, Ada RL consistently outperforms the baselines across most change factors in the MDP and POMDP case for modiﬁed Cartpole, and in the POMDP case for Pong for Ntarget = 50.
Researcher Affiliation	Collaboration	Biwei Huang Carnegie Mellon University biweih@andrew.cmu.edu Fan Feng City University of Hong Kong ffeng1017@gmail.com Chaochao Lu University of Cambridge & Max Planck Institute for Intelligent Systems cl641@cam.ac.uk Sara Magliacane University of Amsterdam & MIT-IBM Watson AI Lab sara.magliacane@gmail.com Kun Zhang Carnegie Mellon University & Mohamed bin Zayed University of Artiﬁcial Intelligence kunz1@cmu.edu
Pseudocode	Yes	We provide the pseudocode for the Ada RL algorithm in Alg. 1. Algorithm A1 Pseudo code of Miss-VAE.
Open Source Code	Yes	Code link: https://github.com/Adaptive-RL/Ada RL-code. Source code is given at https://github.com/Adaptive-RL/Ada RL-code, providing also a complete description of our experimental environment, conﬁguration ﬁles and instructions on the reproduction of our experiments.
Open Datasets	Yes	We modify the Cartpole and Atari Pong environments in Open AI Gym (Brockman et al., 2016). All datasets used are publicly available or instructions are provided on how to generate them in Appendix 5 and 6.
Dataset Splits	No	The paper uses source and target domains for training and evaluation, respectively, and discusses sample sizes (Ntarget) for the target domain, but does not specify conventional dataset training/validation/test splits.
Hardware Specification	Yes	For the model estimation, Cartpole and Pong experiments are implemented on 1 NVIDIA P100 GPUs and 4 NVidia V100 GPUs, respectively. The policy learning stages in both experiments are implemented on 8 Nvidia RTX 1080Ti GPUs.
Software Dependencies	No	In our code, we have used the following libraries which are covered by the corresponding licenses: Tensorﬂow (Apache License 2.0), Pytorch (BSD 3-Clause "New" or "Revised" License), Open AI Gym (MIT License), Open CV (Apache 2 License), Numpy (BSD 3-Clause "New" or "Revised" License) Keras (Apache License).
Experiment Setup	Yes	We use a random policy to collect sequence data from source domains. For both modiﬁed Cartpole and Pong experiments, the sequence length is 40 and the number of sequence is 10000 for each domain. The sampling resolution is set to be 0.02. Other details are summarized in Table A16. Table A16: Experimental details on the model estimation part.