Stick-Breaking Policy Learning in Dec-POMDPs

Authors: Miao Liu, Christopher Amato, Xuejun Liao, Lawrence Carin, Jonathan P. How

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.
Researcher Affiliation Academia Miao Liu MIT Cambridge, MA miaoliu@mit.edu Christopher Amato University of New Hampshire Durham, NH camato@cs.unh.edu Xuejun Liao, Lawrence Carin Duke University Durham, NC {xjliao,lcarin}@duke.edu Jonathan P. How MIT Cambridge, MA jhow@mit.edu
Pseudocode Yes Algorithm 1 Batch VB Inference for Dec-SBPR
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets Yes Downloaded from http://rbr.cs.umass.edu/camato/decpomdp/ download.html
Dataset Splits No The paper mentions using 'K = 300 episodes' for learning and '100 test episodes' for evaluation, but it does not specify explicit train/validation/test splits by percentages or counts, nor does it explicitly mention a validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes For Dec-SBPR, the hyperparameters in (8) are set to c = 0.1 and d = 10 6 to promote sparse usage of FSC nodes. The policies are initialized as FSCs converted from the episodes with the highest rewards using a method similar to [Amato and Zilberstein, 2009].