Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Recurrent Natural Policy Gradient for POMDPs
Authors: Semih Cayci, Atilla Eryilmaz
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide non-asymptotic theoretical guarantees for this method, including bounds on sample and iteration complexity to achieve global optimality up to function approximation. Additionally, we characterize pathological cases that stem from long-term dependencies, thereby explaining limitations of RNN-based policy optimization for POMDPs... We study the performance of Rec-TD numerically in Section C under long-term and short-term dependencies to validate our theoretical results in Section 5.2... The performance of Rec-TD is studied numerically in Random-POMDP instances in Section C. |
| Researcher Affiliation | Academia | Semih Cayci EMAIL Department of Mathematics RWTH Aachen University... Atilla Eryilmaz EMAIL Department of Electrical and Computer Engineering The Ohio State University |
| Pseudocode | Yes | Algorithm 1 Recurrent Natural Actor-Critic (Rec-NAC) a High-level description |
| Open Source Code | No | No explicit statement or link to source code for the described methodology is provided in the paper. |
| Open Datasets | No | The paper mentions numerical experiments using "randomly-generated finite POMDP instance" but does not provide access information or specify a publicly available dataset. |
| Dataset Splits | No | The paper describes generating random POMDP instances and performing "5 trials" but does not specify any train/test/validation splits, cross-validation setup, or other data partitioning methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | We first consider the performance of Rec-TD with learning rate η = 0.05, discount factor γ = 0.9 and RNNs with various choices of network width m. For pexp = 0.8, the performance of Rec-TD is demonstrated in Figure 2... The exploration probability is reduced to pexp = 0.25... |