Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods
Authors: Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm H Van Seijen
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna. |
| Researcher Affiliation | Collaboration | 1University of Alberta 2Mila Quebec AI Institute 3Universite de Montreal 4Microsoft 5Ecole Polytechnique de Montreal 6Canada CIFAR AI Chair. |
| Pseudocode | Yes | Algorithm 1 Adaptive Linear Dyna and Algorithm 2 Nonlinear Dyna Q are provided in Appendix D. |
| Open Source Code | Yes | The code of all of the experiments presented in this paper is available at https: //github.com/chandar-lab/Lo CA2. |
| Open Datasets | Yes | We introduce a variation on the Reacher environment (the easy version) available from the Deep Mind Control Suite (Tassa et al., 2018). |
| Dataset Splits | Yes | We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. The best hyperparameter setting is the one that performs the best in Phase 2, among those that perform well in Phase 1. |
| Hardware Specification | No | The paper mentions that "Compute Canada and Calcul Quebec" provided computing resources, but does not specify any particular hardware details like CPU or GPU models. |
| Software Dependencies | No | The paper mentions "Pytorch (Paszke et al., 2019)" in context of optimizer parameters but does not specify a version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. Details of the experiment setup are summarized in Sections B.1 and B.4. |