Periodic agent-state based Q-learning for POMDPs
Authors: Amit Sinha, Matthieu Geist, Aditya Mahajan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies. |
| Researcher Affiliation | Collaboration | Amit Sinha1, Matthieu Geist2, and Aditya Mahajan1 1Mc Gill University, Mila 2Cohere |
| Pseudocode | No | The paper describes algorithms (PASQL) and their update rules using mathematical notation, but it does not include any clearly labeled "Pseudocode" or "Algorithm" blocks with structured, step-by-step procedures. |
| Open Source Code | No | We intend to make the code open access after the review process is complete. |
| Open Datasets | No | The paper defines custom POMDP models (Example 1 and Example 2) within the text for its numerical experiments, rather than using or providing access to external, publicly available datasets. For instance, "Example 1 Consider a POMDP with S = {0, 1, . . . , 5}, A = {0, 1}, Y = {0, 1} and γ = 0.9. The dynamics are as shown in Fig. 2." |
| Dataset Splits | No | The paper mentions running experiments for "25 random seeds" but does not specify explicit training, validation, and test dataset splits in terms of percentages or sample counts for any dataset. |
| Hardware Specification | No | The numerical experiments were enabled in part by support provided by Calcul Québec and Compute Canada. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that would allow for reproducible setup of the environment. |
| Experiment Setup | Yes | The hyperparameters for the numerical experiments presented in Sec. 3 are shown in App. H. Table 3: Hyperparameters used in Ex. 1 Parameter Value Training steps 10^6 Start learn rate 10^-3 End learn rate 10^-5 Learn rate schedule Exponential Exponential decay rate 1.0 Number of random seeds 25 |