Maximum Entropy Model Correction in Reinforcement Learning

Authors: Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud Farahmand

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show the effectiveness of Mo Co VI and Mo Co Dyna to utilize an approximate model. We consider the 6x6 grid world environment with four actions introduced by Rakhsha et al. (2022), with gamma = 0.9. We defer the details of the environment to the supplementary material. As shown in Theorem 2, the convergence rate of Mo Co VI depends on the model error and d. We introduce error
Researcher Affiliation Collaboration Amin Rakhsha1,2, Mete Kemertas1,2, Mohammad Ghavamzadeh3, Amir-massoud Farahmand1,2 1Department of Computer Science, University of Toronto, 2Vector Institute, 3Amazon
Pseudocode Yes Algorithm 1 Mo Co Dyna(T, d, c, beta, K)
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets Yes We consider the 6x6 grid world environment with four actions introduced by Rakhsha et al. (2022), with gamma = 0.9. We defer the details of the environment to the supplementary material.
Dataset Splits No The paper describes the environment and the data generation process (sampling from the environment) but does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper mentions general support from organizations ('Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.'), but it does not provide specific details about the hardware (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies No The paper mentions using the 'BFGS algorithm in Sci Py library' but does not specify the version numbers for SciPy or any other software dependencies.
Experiment Setup Yes The hyperparameters of Mo Co Dyna for PE and control problems are given in Tables 3 and 4.