The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs
Authors: Johannes Müller, Guido Montufar
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we use a navigation problem in a grid world to demonstrate that the polynomial programming formulation can offer a computationally feasible approach to the reward maximization problem. ... Using the modeling language Ju MP and the interior point solver Ipopt we directly obtained the globally optimal solution to problem (29) ... The computations took 0.01s (on a 2 GHz Quad-Core Intel Core i5 processor). |
| Researcher Affiliation | Academia | Johannes M uller Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany jmueller@mis.mpg.de Guido Mont ufar Department of Mathematics and Department of Statistics, UCLA, CA, USA Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany montufar@math.ucla.edu |
| Pseudocode | Yes | Algorithm 1 Polynomial programming for POMDPs |
| Open Source Code | Yes | The Julia code is available in the supplements and under https://github.com/muellerjohannes/ geometry-POMDPs-ICLR-2022. |
| Open Datasets | No | The paper uses custom-defined problem instances (e.g., a grid world) rather than established, publicly accessible datasets with formal access information. The problem parameters are described in the text. |
| Dataset Splits | No | The paper describes problem setups and solves them using an optimization approach, rather than training a machine learning model with distinct training/validation/test splits. |
| Hardware Specification | Yes | The computations took 0.01s (on a 2 GHz Quad-Core Intel Core i5 processor). ... The solver took around 0.03s consistently (on a 2 GHz Quad-Core Intel Core i5 processor). |
| Software Dependencies | No | The paper mentions using "Ju MP" and "Ipopt" (Julia packages) and "Python Sum Of Squares", but it does not specify their version numbers. |
| Experiment Setup | Yes | For the toy example: "We consider state, observation, and action spaces with two elements each, as well as following deterministic transition mechanism α, observation mechanism β, and instantaneous reward r...". For the grid world: "We consider the grid world depicted in Figure 6 with 13 states and 7 observations... The four actions are {R, L, U, D}... The transitions are deterministic... Let us now consider the uniform distribution µs = 1/13 for s S as an initial distribution and γ = 0.999 as a discount factor." |