Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP
Authors: Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of PORL2 using the partially observed combination lock (pocomblock) as our benchmark, which is inspired by the combination lock benchmark introduced by Misra et al. (2019). ... In our experiment, we compare our method with BRIEE, the latest representation learning algorithm for MDP. ... Figure 2 is the moving average of evaluation returns of pocomblock for PORL2 and BRIEE |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ, USA 2School of Mathematical Sciences, Fudan University, Shanghai, China 3School of Electrical Engineering and Computer Science, Oregon State University, OR, USA 4Department of Statistics and Data Science, Yale University, NH, USA. |
| Pseudocode | Yes | Algorithm 1 Partially Observable Representation Learning for L-decodable POMDPs (PORL2-decodable) |
| Open Source Code | Yes | Reproducibility. Our model and code can be found at https://github.com/icmlpomdpexpe/POMDPreplearn. |
| Open Datasets | Yes | We evaluate the performance of PORL2 using the partially observed combination lock (pocomblock) as our benchmark, which is inspired by the combination lock benchmark introduced by Misra et al. (2019). |
| Dataset Splits | No | No explicit statement of training, validation, or test dataset splits (e.g., percentages or counts) was found. The paper describes the custom 'pocomblock' environment used for evaluation and lists hyperparameters in tables, but not dataset splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) were explicitly stated. The paper mentions using a 'two-layer neural network' and 'SGD' optimizer. |
| Experiment Setup | Yes | We record the hyperparameters we try and the final hyperparameter we use for PORL2 in Table 2 and BRIEE in Table 3. These tables provide specific values for Batch size, Discriminator f number of gradient steps, Horizon, The number of iterations of representation learning, LSVI-LLR bonus coefficient β, LSVI-LLR regularization coefficient λ, Optimizer, Decoder ϕ learning rate, Discriminator f learning rate. |