Action-Sufficient State Representation Learning for Control with Structural Constraints

Authors: Biwei Huang, Chaochao Lu, Liu Leqi, Jose Miguel Hernandez-Lobato, Clark Glymour, Bernhard Schölkopf, Kun Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results on Car Racing and Viz Doom demonstrate a clear advantage of learning and using ASRs for policy learning.
Researcher Affiliation Academia 1Carnegie Mellon University 2University of Cambridge 3Max Planck Institute for Intelligent Systems, Tübingen 4Mohamed bin Zayed University of Artificial Intelligence.
Pseudocode Yes Algorithm 1 in Appendix G gives the detailed procedure of model-free policy learning with ASRs in partially observable environments. Algorithm 2 presents the procedure of the classic model-based Dyna algorithm with ASRs.
Open Source Code No The paper does not provide any explicit statements about the availability of open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes To evaluate the proposed approach, we conducted experiments on both Car Racing environment (Klimov, 2016) with an illustration in Figure 4 and Viz Doom (Kempka et al., 2016) environment with an illustration in Figure 5
Dataset Splits No The paper describes collecting '10k random rollouts' for data and evaluation metrics like 'average cumulative reward of the 16 random rollouts' and 'tested it over 1024 random rollout scenarios', but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or counts) for the data used to train the models.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions software components and libraries like LSTM, MDN, DDPG, DQN, and DRQN, but does not provide specific version numbers for any of them.
Experiment Setup Yes The dimensionality of latent states st was set to d = 32, determined by hyperparameter tuning. regularization parameters was set to λ1 =1, λ2 =1, λ3 =1, λ4 =1, λ5 =1, λ6 =6, λ7 =10, λ8 =0.1, which are determined by hyperparameter tuning. We used 256 hidden units in the LSTM and used a five-component Gaussian mixture in the MDN.