reproducibilityindex.ai

How to Explore with Belief: State Entropy Maximization in POMDPs

Authors: Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we provide an empirical corroboration of the proposed methods and reported claims (results reported in Figure 2 and 3).
Researcher Affiliation	Academia	1Politecnico di Milano, Milan, Italy. 2Technion Israel Institute of Technology, Haifa, Israel.
Pseudocode	Yes	Algorithm 1 Reg-PG for Max Ent POMDPs
Open Source Code	Yes	The code is available at this link.
Open Datasets	No	The paper uses custom Gridworld environments ("5x5-Gridworld", "6x6-Gridworld") without providing a specific link, DOI, formal citation, or stating their public availability as a dataset.
Dataset Splits	No	The paper describes the experimental environments and reports results over multiple runs but does not explicitly provide training, validation, or test dataset splits, percentages, or sample counts.
Hardware Specification	No	Table 1. Wall-clock time [sec] of the main experiments on general-purpose CPUs. No specific CPU or GPU models, memory details, or other hardware specifications are provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, solvers) are mentioned in the paper.
Experiment Setup	Yes	The learning rate was selected as α = 0.3. The batch size was selected to be N = 10 after tuning. As for the time horizon, T = S in all the experiments. This makes the exploration task more challenging as every state can be visited at most once. The best regularization term ρ was found to be approximately equal to 0.02.