Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Spectral Learning for Infinite-Horizon Average-Reward POMDPs

Authors: Alessio Russo, Alberto Maria Metelli, Marcello Restelli

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present numerical simulations that validate the theoretical analysis of both the proposed estimation procedure and the Mixed Spectral UCRL algorithm.
Researcher Affiliation Academia Alessio Russo DEIB, Politecnico di Milano EMAIL Alberto Maria Metelli DEIB, Politecnico di Milano EMAIL Marcello Restelli DEIB, Politecnico di Milano EMAIL
Pseudocode Yes Algorithm 1 Mixed Spectral Estimation. 1: Input: Trajectory set Γ {τl}L 1l 0 where for each l we have τl {pol j, al j}Nl 1j 0 2: Output: Estimated Observation model p O and Transition model {p Ta}a PA 3: for a ∈ A do
Open Source Code Yes The codebase can be found at https://github.com/alesnow97/Spectral_Learning_POMDP.git.
Open Datasets No For the generation of the different POMDPs, we adopted a similar approach to the one followed in [25]. The matrices of both the observation and transition models are randomly generated, and successive modifications are applied:
Dataset Splits No The simulation splits the interaction horizon into 10 episodes of equal length, and for each episode, we use a different belief-based policy for data collection.
Hardware Specification Yes The simulations illustrated in this work have been run on an 88 Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz CPUs with 94 GB of RAM.
Software Dependencies No The paper mentions the codebase is available at a GitHub link, implying standard software like Python would be used, but it does not specify any particular software, libraries, or frameworks with their version numbers.
Experiment Setup Yes For the experiments on the regret, we adopted the following hyperparameters for the different algorithms. Mixed Spectral UCRL: length of initial episode T0 3 · 105; SM-UCRL: length of initial episode T0 3 · 105, minimum action probability ι 0.02; SEEU: length of exploration phase τ1 105, length of initial exploitation phase τ2 3 · 105. At each new episode l, the length of the exploitation phase is computed as √l · τ2, as defined in the original work. ... For the considered simulations, we adopted a discretization step size of 0.04.