Maximum Entropy Model Correction in Reinforcement Learning
Authors: Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show the effectiveness of Mo Co VI and Mo Co Dyna to utilize an approximate model. We consider the 6x6 grid world environment with four actions introduced by Rakhsha et al. (2022), with gamma = 0.9. We defer the details of the environment to the supplementary material. As shown in Theorem 2, the convergence rate of Mo Co VI depends on the model error and d. We introduce error |
| Researcher Affiliation | Collaboration | Amin Rakhsha1,2, Mete Kemertas1,2, Mohammad Ghavamzadeh3, Amir-massoud Farahmand1,2 1Department of Computer Science, University of Toronto, 2Vector Institute, 3Amazon |
| Pseudocode | Yes | Algorithm 1 Mo Co Dyna(T, d, c, beta, K) |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | We consider the 6x6 grid world environment with four actions introduced by Rakhsha et al. (2022), with gamma = 0.9. We defer the details of the environment to the supplementary material. |
| Dataset Splits | No | The paper describes the environment and the data generation process (sampling from the environment) but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper mentions general support from organizations ('Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.'), but it does not provide specific details about the hardware (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | The paper mentions using the 'BFGS algorithm in Sci Py library' but does not specify the version numbers for SciPy or any other software dependencies. |
| Experiment Setup | Yes | The hyperparameters of Mo Co Dyna for PE and control problems are given in Tables 3 and 4. |