Stick-Breaking Policy Learning in Dec-POMDPs
Authors: Miao Liu, Christopher Amato, Xuejun Liao, Lawrence Carin, Jonathan P. How
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods. |
| Researcher Affiliation | Academia | Miao Liu MIT Cambridge, MA miaoliu@mit.edu Christopher Amato University of New Hampshire Durham, NH camato@cs.unh.edu Xuejun Liao, Lawrence Carin Duke University Durham, NC {xjliao,lcarin}@duke.edu Jonathan P. How MIT Cambridge, MA jhow@mit.edu |
| Pseudocode | Yes | Algorithm 1 Batch VB Inference for Dec-SBPR |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | Downloaded from http://rbr.cs.umass.edu/camato/decpomdp/ download.html |
| Dataset Splits | No | The paper mentions using 'K = 300 episodes' for learning and '100 test episodes' for evaluation, but it does not specify explicit train/validation/test splits by percentages or counts, nor does it explicitly mention a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | For Dec-SBPR, the hyperparameters in (8) are set to c = 0.1 and d = 10 6 to promote sparse usage of FSC nodes. The policies are initialized as FSCs converted from the episodes with the highest rewards using a method similar to [Amato and Zilberstein, 2009]. |