reproducibilityindex.ai

Maximum Entropy Model Correction in Reinforcement Learning

Authors: Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show the effectiveness of Mo Co VI and Mo Co Dyna to utilize an approximate model. We consider the 6x6 grid world environment with four actions introduced by Rakhsha et al. (2022), with gamma = 0.9. We defer the details of the environment to the supplementary material. As shown in Theorem 2, the convergence rate of Mo Co VI depends on the model error and d. We introduce error
Researcher Affiliation	Collaboration	Amin Rakhsha1,2, Mete Kemertas1,2, Mohammad Ghavamzadeh3, Amir-massoud Farahmand1,2 1Department of Computer Science, University of Toronto, 2Vector Institute, 3Amazon
Pseudocode	Yes	Algorithm 1 Mo Co Dyna(T, d, c, beta, K)
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets	Yes	We consider the 6x6 grid world environment with four actions introduced by Rakhsha et al. (2022), with gamma = 0.9. We defer the details of the environment to the supplementary material.
Dataset Splits	No	The paper describes the environment and the data generation process (sampling from the environment) but does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper mentions general support from organizations ('Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.'), but it does not provide specific details about the hardware (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies	No	The paper mentions using the 'BFGS algorithm in Sci Py library' but does not specify the version numbers for SciPy or any other software dependencies.
Experiment Setup	Yes	The hyperparameters of Mo Co Dyna for PE and control problems are given in Tables 3 and 4.