Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Periodic agent-state based Q-learning for POMDPs
Authors: Amit Sinha, Matthieu Geist, Aditya Mahajan
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies. |
| Researcher Affiliation | Collaboration | Amit Sinha1, Matthieu Geist2, and Aditya Mahajan1 1Mc Gill University, Mila 2Cohere |
| Pseudocode | No | The paper describes algorithms (PASQL) and their update rules using mathematical notation, but it does not include any clearly labeled "Pseudocode" or "Algorithm" blocks with structured, step-by-step procedures. |
| Open Source Code | No | We intend to make the code open access after the review process is complete. |
| Open Datasets | No | The paper defines custom POMDP models (Example 1 and Example 2) within the text for its numerical experiments, rather than using or providing access to external, publicly available datasets. For instance, "Example 1 Consider a POMDP with S = {0, 1, . . . , 5}, A = {0, 1}, Y = {0, 1} and γ = 0.9. The dynamics are as shown in Fig. 2." |
| Dataset Splits | No | The paper mentions running experiments for "25 random seeds" but does not specify explicit training, validation, and test dataset splits in terms of percentages or sample counts for any dataset. |
| Hardware Specification | No | The numerical experiments were enabled in part by support provided by Calcul Québec and Compute Canada. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that would allow for reproducible setup of the environment. |
| Experiment Setup | Yes | The hyperparameters for the numerical experiments presented in Sec. 3 are shown in App. H. Table 3: Hyperparameters used in Ex. 1 Parameter Value Training steps 10^6 Start learn rate 10^-3 End learn rate 10^-5 Learn rate schedule Exponential Exponential decay rate 1.0 Number of random seeds 25 |