Action-Sufficient State Representation Learning for Control with Structural Constraints
Authors: Biwei Huang, Chaochao Lu, Liu Leqi, Jose Miguel Hernandez-Lobato, Clark Glymour, Bernhard Schölkopf, Kun Zhang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on Car Racing and Viz Doom demonstrate a clear advantage of learning and using ASRs for policy learning. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University 2University of Cambridge 3Max Planck Institute for Intelligent Systems, Tübingen 4Mohamed bin Zayed University of Artificial Intelligence. |
| Pseudocode | Yes | Algorithm 1 in Appendix G gives the detailed procedure of model-free policy learning with ASRs in partially observable environments. Algorithm 2 presents the procedure of the classic model-based Dyna algorithm with ASRs. |
| Open Source Code | No | The paper does not provide any explicit statements about the availability of open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | To evaluate the proposed approach, we conducted experiments on both Car Racing environment (Klimov, 2016) with an illustration in Figure 4 and Viz Doom (Kempka et al., 2016) environment with an illustration in Figure 5 |
| Dataset Splits | No | The paper describes collecting '10k random rollouts' for data and evaluation metrics like 'average cumulative reward of the 16 random rollouts' and 'tested it over 1024 random rollout scenarios', but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or counts) for the data used to train the models. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU models, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and libraries like LSTM, MDN, DDPG, DQN, and DRQN, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The dimensionality of latent states st was set to d = 32, determined by hyperparameter tuning. regularization parameters was set to λ1 =1, λ2 =1, λ3 =1, λ4 =1, λ5 =1, λ6 =6, λ7 =10, λ8 =0.1, which are determined by hyperparameter tuning. We used 256 hidden units in the LSTM and used a five-component Gaussian mixture in the MDN. |