Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods
Authors: Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm H Van Seijen
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna. |
| Researcher Affiliation | Collaboration | 1University of Alberta 2Mila Quebec AI Institute 3Universite de Montreal 4Microsoft 5Ecole Polytechnique de Montreal 6Canada CIFAR AI Chair. |
| Pseudocode | Yes | Algorithm 1 Adaptive Linear Dyna and Algorithm 2 Nonlinear Dyna Q are provided in Appendix D. |
| Open Source Code | Yes | The code of all of the experiments presented in this paper is available at https: //github.com/chandar-lab/Lo CA2. |
| Open Datasets | Yes | We introduce a variation on the Reacher environment (the easy version) available from the Deep Mind Control Suite (Tassa et al., 2018). |
| Dataset Splits | Yes | We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. The best hyperparameter setting is the one that performs the best in Phase 2, among those that perform well in Phase 1. |
| Hardware Specification | No | The paper mentions that "Compute Canada and Calcul Quebec" provided computing resources, but does not specify any particular hardware details like CPU or GPU models. |
| Software Dependencies | No | The paper mentions "Pytorch (Paszke et al., 2019)" in context of optimizer parameters but does not specify a version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. Details of the experiment setup are summarized in Sections B.1 and B.4. |