How to Explore with Belief: State Entropy Maximization in POMDPs

Authors: Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide an empirical corroboration of the proposed methods and reported claims (results reported in Figure 2 and 3).
Researcher Affiliation Academia 1Politecnico di Milano, Milan, Italy. 2Technion Israel Institute of Technology, Haifa, Israel.
Pseudocode Yes Algorithm 1 Reg-PG for Max Ent POMDPs
Open Source Code Yes The code is available at this link.
Open Datasets No The paper uses custom Gridworld environments ("5x5-Gridworld", "6x6-Gridworld") without providing a specific link, DOI, formal citation, or stating their public availability as a dataset.
Dataset Splits No The paper describes the experimental environments and reports results over multiple runs but does not explicitly provide training, validation, or test dataset splits, percentages, or sample counts.
Hardware Specification No Table 1. Wall-clock time [sec] of the main experiments on general-purpose CPUs. No specific CPU or GPU models, memory details, or other hardware specifications are provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, solvers) are mentioned in the paper.
Experiment Setup Yes The learning rate was selected as α = 0.3. The batch size was selected to be N = 10 after tuning. As for the time horizon, T = S in all the experiments. This makes the exploration task more challenging as every state can be visited at most once. The best regularization term ρ was found to be approximately equal to 0.02.