Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

Authors: Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm H Van Seijen

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate these insights in the case of linear function approximation by demonstrating that a modified version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna.
Researcher Affiliation Collaboration 1University of Alberta 2Mila Quebec AI Institute 3Universite de Montreal 4Microsoft 5Ecole Polytechnique de Montreal 6Canada CIFAR AI Chair.
Pseudocode Yes Algorithm 1 Adaptive Linear Dyna and Algorithm 2 Nonlinear Dyna Q are provided in Appendix D.
Open Source Code Yes The code of all of the experiments presented in this paper is available at https: //github.com/chandar-lab/Lo CA2.
Open Datasets Yes We introduce a variation on the Reacher environment (the easy version) available from the Deep Mind Control Suite (Tassa et al., 2018).
Dataset Splits Yes We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. The best hyperparameter setting is the one that performs the best in Phase 2, among those that perform well in Phase 1.
Hardware Specification No The paper mentions that "Compute Canada and Calcul Quebec" provided computing resources, but does not specify any particular hardware details like CPU or GPU models.
Software Dependencies No The paper mentions "Pytorch (Paszke et al., 2019)" in context of optimizer parameters but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup Yes We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. Details of the experiment setup are summarized in Sections B.1 and B.4.