How to Explore with Belief: State Entropy Maximization in POMDPs
Authors: Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide an empirical corroboration of the proposed methods and reported claims (results reported in Figure 2 and 3). |
| Researcher Affiliation | Academia | 1Politecnico di Milano, Milan, Italy. 2Technion Israel Institute of Technology, Haifa, Israel. |
| Pseudocode | Yes | Algorithm 1 Reg-PG for Max Ent POMDPs |
| Open Source Code | Yes | The code is available at this link. |
| Open Datasets | No | The paper uses custom Gridworld environments ("5x5-Gridworld", "6x6-Gridworld") without providing a specific link, DOI, formal citation, or stating their public availability as a dataset. |
| Dataset Splits | No | The paper describes the experimental environments and reports results over multiple runs but does not explicitly provide training, validation, or test dataset splits, percentages, or sample counts. |
| Hardware Specification | No | Table 1. Wall-clock time [sec] of the main experiments on general-purpose CPUs. No specific CPU or GPU models, memory details, or other hardware specifications are provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, solvers) are mentioned in the paper. |
| Experiment Setup | Yes | The learning rate was selected as α = 0.3. The batch size was selected to be N = 10 after tuning. As for the time horizon, T = S in all the experiments. This makes the exploration task more challenging as every state can be visited at most once. The best regularization term ρ was found to be approximately equal to 0.02. |