reproducibilityindex.ai

QMDP-Net: Deep Learning for Planning under Partial Observability

Authors: Peter Karkus, David Hsu, Wee Sun Lee

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.
Researcher Affiliation	Academia	Peter Karkus1,2 David Hsu1,2 Wee Sun Lee2 1NUS Graduate School for Integrative Sciences and Engineering 2School of Computing National University of Singapore {karkus, dyhsu, leews}@comp.nus.edu.sg
Pseudocode	No	The paper describes the architecture and algorithms in text and figures, but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Our implementation in Tensorﬂow [1] is available online at http://github.com/Ada Comp NUS/qmdp-net.
Open Datasets	No	We trained a policy using expert trajectories from 10, 000 random environments, 5 trajectories from each environment.
Dataset Splits	No	The paper mentions training and testing sets, but does not explicitly describe a separate validation set or split percentages for training, validation, and test.
Hardware Specification	No	The paper discusses simulated tasks and mentions training with TensorFlow, but it does not specify any hardware details such as CPU/GPU models, memory, or cloud instance types used for the experiments.
Software Dependencies	No	The paper mentions "Our implementation in TensorFlow [1] is available online", but does not specify a version number for TensorFlow or any other software dependencies.
Experiment Setup	Yes	We deﬁne loss as the cross entropy between predicted and demonstrated action sequences and use RMSProp [35] for training. ... We used K = 20 . . . 116 depending on the problem size. We were able to transfer policies to larger environments by increasing K up to 450 when executing the policy.