reproducibilityindex.ai

Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

Authors: Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm H Van Seijen

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate these insights in the case of linear function approximation by demonstrating that a modiﬁed version of linear Dyna achieves effective adaptation to local changes. Furthermore, we provide detailed insights into the challenges of building an adaptive nonlinear model-based method, by experimenting with a nonlinear version of Dyna.
Researcher Affiliation	Collaboration	1University of Alberta 2Mila Quebec AI Institute 3Universite de Montreal 4Microsoft 5Ecole Polytechnique de Montreal 6Canada CIFAR AI Chair.
Pseudocode	Yes	Algorithm 1 Adaptive Linear Dyna and Algorithm 2 Nonlinear Dyna Q are provided in Appendix D.
Open Source Code	Yes	The code of all of the experiments presented in this paper is available at https: //github.com/chandar-lab/Lo CA2.
Open Datasets	Yes	We introduce a variation on the Reacher environment (the easy version) available from the Deep Mind Control Suite (Tassa et al., 2018).
Dataset Splits	Yes	We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. The best hyperparameter setting is the one that performs the best in Phase 2, among those that perform well in Phase 1.
Hardware Specification	No	The paper mentions that "Compute Canada and Calcul Quebec" provided computing resources, but does not specify any particular hardware details like CPU or GPU models.
Software Dependencies	No	The paper mentions "Pytorch (Paszke et al., 2019)" in context of optimizer parameters but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup	Yes	We searched over the KL loss scale β {0.1, 0.3, 1, 3}, the actor entropy loss scale η {10 5, 10 4, 3 10 4, 10 3}, and the discount factor γ {0.99, 0.999}. Details of the experiment setup are summarized in Sections B.1 and B.4.