Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Authors: Biwei Huang, Fan Feng, Chaochao Lu, Sara Magliacane, Kun Zhang

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the efficacy of Ada RL through a series of experiments that vary factors in the observation, transition and reward functions for Cartpole and Atari games 1. As shown in Tables 1, 2 and 3 2, Ada RL consistently outperforms the baselines across most change factors in the MDP and POMDP case for modified Cartpole, and in the POMDP case for Pong for Ntarget = 50.
Researcher Affiliation Collaboration Biwei Huang Carnegie Mellon University EMAIL Fan Feng City University of Hong Kong EMAIL Chaochao Lu University of Cambridge & Max Planck Institute for Intelligent Systems EMAIL Sara Magliacane University of Amsterdam & MIT-IBM Watson AI Lab EMAIL Kun Zhang Carnegie Mellon University & Mohamed bin Zayed University of Artificial Intelligence EMAIL
Pseudocode Yes We provide the pseudocode for the Ada RL algorithm in Alg. 1. Algorithm A1 Pseudo code of Miss-VAE.
Open Source Code Yes Code link: https://github.com/Adaptive-RL/Ada RL-code. Source code is given at https://github.com/Adaptive-RL/Ada RL-code, providing also a complete description of our experimental environment, configuration files and instructions on the reproduction of our experiments.
Open Datasets Yes We modify the Cartpole and Atari Pong environments in Open AI Gym (Brockman et al., 2016). All datasets used are publicly available or instructions are provided on how to generate them in Appendix 5 and 6.
Dataset Splits No The paper uses source and target domains for training and evaluation, respectively, and discusses sample sizes (Ntarget) for the target domain, but does not specify conventional dataset training/validation/test splits.
Hardware Specification Yes For the model estimation, Cartpole and Pong experiments are implemented on 1 NVIDIA P100 GPUs and 4 NVidia V100 GPUs, respectively. The policy learning stages in both experiments are implemented on 8 Nvidia RTX 1080Ti GPUs.
Software Dependencies No In our code, we have used the following libraries which are covered by the corresponding licenses: Tensorflow (Apache License 2.0), Pytorch (BSD 3-Clause "New" or "Revised" License), Open AI Gym (MIT License), Open CV (Apache 2 License), Numpy (BSD 3-Clause "New" or "Revised" License) Keras (Apache License).
Experiment Setup Yes We use a random policy to collect sequence data from source domains. For both modified Cartpole and Pong experiments, the sequence length is 40 and the number of sequence is 10000 for each domain. The sampling resolution is set to be 0.02. Other details are summarized in Table A16. Table A16: Experimental details on the model estimation part.