reproducibilityindex.ai

The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

Authors: Johannes Müller, Guido Montufar

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we use a navigation problem in a grid world to demonstrate that the polynomial programming formulation can offer a computationally feasible approach to the reward maximization problem. ... Using the modeling language Ju MP and the interior point solver Ipopt we directly obtained the globally optimal solution to problem (29) ... The computations took 0.01s (on a 2 GHz Quad-Core Intel Core i5 processor).
Researcher Affiliation	Academia	Johannes M uller Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany jmueller@mis.mpg.de Guido Mont ufar Department of Mathematics and Department of Statistics, UCLA, CA, USA Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany montufar@math.ucla.edu
Pseudocode	Yes	Algorithm 1 Polynomial programming for POMDPs
Open Source Code	Yes	The Julia code is available in the supplements and under https://github.com/muellerjohannes/ geometry-POMDPs-ICLR-2022.
Open Datasets	No	The paper uses custom-defined problem instances (e.g., a grid world) rather than established, publicly accessible datasets with formal access information. The problem parameters are described in the text.
Dataset Splits	No	The paper describes problem setups and solves them using an optimization approach, rather than training a machine learning model with distinct training/validation/test splits.
Hardware Specification	Yes	The computations took 0.01s (on a 2 GHz Quad-Core Intel Core i5 processor). ... The solver took around 0.03s consistently (on a 2 GHz Quad-Core Intel Core i5 processor).
Software Dependencies	No	The paper mentions using "Ju MP" and "Ipopt" (Julia packages) and "Python Sum Of Squares", but it does not specify their version numbers.
Experiment Setup	Yes	For the toy example: "We consider state, observation, and action spaces with two elements each, as well as following deterministic transition mechanism α, observation mechanism β, and instantaneous reward r...". For the grid world: "We consider the grid world depicted in Figure 6 with 13 states and 7 observations... The four actions are {R, L, U, D}... The transitions are deterministic... Let us now consider the uniform distribution µs = 1/13 for s S as an initial distribution and γ = 0.999 as a discount factor."