Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning
Authors: Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a comprehensive empirical comparison to current existing RL algorithms for POMDPs on several benchmarks, demonstrating the superior empirical performance of µLV-Rep (Section 7). |
| Researcher Affiliation | Collaboration | 1 University of Alberta 2 UT Austin 3 The Chinese University of Hong Kong, Shenzhen 4 Google DeepMind 5 Georgia Tech. |
| Pseudocode | Yes | Algorithm 1 Online Exploration for L-step decodable POMDPs with Latent Variable Representation |
| Open Source Code | No | The paper does not explicitly state that its source code is made open source or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate the proposed method on Meta-world (Yu et al., 2019), which is an open-source simulated benchmark consisting of 50 distinct robotic manipulation tasks with visual observations. We also provide experiment results on partial observable control problems constructed based on Open AI gym Mu Jo Co (Todorov et al., 2012) in Appendix H.2. |
| Dataset Splits | No | The paper mentions evaluating on Meta-world and MuJoCo tasks but does not specify train/validation/test splits, only referring to 'standard split' or general evaluation protocols. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using SAC (Haarnoja et al., 2018), Dreamer V2 (Hafner et al., 2021), MWM (Seo et al., 2023), and VAE (Kingma & Welling, 2013), but it does not specify version numbers for any of these or other software dependencies. |
| Experiment Setup | Yes | More implementation details, including network architectures and hyper-parameters, are provided in Appendix H. Table 2. Hyperparameters in µLV-Rep. The numbers in Conv and MLP denote the output channels and units. |