Periodic agent-state based Q-learning for POMDPs

Authors: Amit Sinha, Matthieu Geist, Aditya Mahajan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.
Researcher Affiliation Collaboration Amit Sinha1, Matthieu Geist2, and Aditya Mahajan1 1Mc Gill University, Mila 2Cohere
Pseudocode No The paper describes algorithms (PASQL) and their update rules using mathematical notation, but it does not include any clearly labeled "Pseudocode" or "Algorithm" blocks with structured, step-by-step procedures.
Open Source Code No We intend to make the code open access after the review process is complete.
Open Datasets No The paper defines custom POMDP models (Example 1 and Example 2) within the text for its numerical experiments, rather than using or providing access to external, publicly available datasets. For instance, "Example 1 Consider a POMDP with S = {0, 1, . . . , 5}, A = {0, 1}, Y = {0, 1} and γ = 0.9. The dynamics are as shown in Fig. 2."
Dataset Splits No The paper mentions running experiments for "25 random seeds" but does not specify explicit training, validation, and test dataset splits in terms of percentages or sample counts for any dataset.
Hardware Specification No The numerical experiments were enabled in part by support provided by Calcul Québec and Compute Canada.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that would allow for reproducible setup of the environment.
Experiment Setup Yes The hyperparameters for the numerical experiments presented in Sec. 3 are shown in App. H. Table 3: Hyperparameters used in Ex. 1 Parameter Value Training steps 10^6 Start learn rate 10^-3 End learn rate 10^-5 Learn rate schedule Exponential Exponential decay rate 1.0 Number of random seeds 25