Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Maximum Entropy Model Correction in Reinforcement Learning
Authors: Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show the effectiveness of Mo Co VI and Mo Co Dyna to utilize an approximate model. We consider the 6x6 grid world environment with four actions introduced by Rakhsha et al. (2022), with gamma = 0.9. We defer the details of the environment to the supplementary material. As shown in Theorem 2, the convergence rate of Mo Co VI depends on the model error and d. We introduce error |
| Researcher Affiliation | Collaboration | Amin Rakhsha1,2, Mete Kemertas1,2, Mohammad Ghavamzadeh3, Amir-massoud Farahmand1,2 1Department of Computer Science, University of Toronto, 2Vector Institute, 3Amazon |
| Pseudocode | Yes | Algorithm 1 Mo Co Dyna(T, d, c, beta, K) |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We consider the 6x6 grid world environment with four actions introduced by Rakhsha et al. (2022), with gamma = 0.9. We defer the details of the environment to the supplementary material. |
| Dataset Splits | No | The paper describes the environment and the data generation process (sampling from the environment) but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper mentions general support from organizations ('Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.'), but it does not provide specific details about the hardware (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | The paper mentions using the 'BFGS algorithm in Sci Py library' but does not specify the version numbers for SciPy or any other software dependencies. |
| Experiment Setup | Yes | The hyperparameters of Mo Co Dyna for PE and control problems are given in Tables 3 and 4. |