Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

Authors: Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm H Van Seijen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna.
Researcher Affiliation Collaboration 1University of Alberta 2Mila Quebec AI Institute 3Universite de Montreal 4Microsoft 5Ecole Polytechnique de Montreal 6Canada CIFAR AI Chair.
Pseudocode Yes Algorithm 1 Adaptive Linear Dyna and Algorithm 2 Nonlinear Dyna Q are provided in Appendix D.
Open Source Code Yes The code of all of the experiments presented in this paper is available at https: //github.com/chandar-lab/Lo CA2.
Open Datasets Yes We introduce a variation on the Reacher environment (the easy version) available from the Deep Mind Control Suite (Tassa et al., 2018).
Dataset Splits Yes We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. The best hyperparameter setting is the one that performs the best in Phase 2, among those that perform well in Phase 1.
Hardware Specification No The paper mentions that "Compute Canada and Calcul Quebec" provided computing resources, but does not specify any particular hardware details like CPU or GPU models.
Software Dependencies No The paper mentions "Pytorch (Paszke et al., 2019)" in context of optimizer parameters but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup Yes We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. Details of the experiment setup are summarized in Sections B.1 and B.4.