QMDP-Net: Deep Learning for Planning under Partial Observability

Authors: Peter Karkus, David Hsu, Wee Sun Lee

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.
Researcher Affiliation Academia Peter Karkus1,2 David Hsu1,2 Wee Sun Lee2 1NUS Graduate School for Integrative Sciences and Engineering 2School of Computing National University of Singapore {karkus, dyhsu, leews}@comp.nus.edu.sg
Pseudocode No The paper describes the architecture and algorithms in text and figures, but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our implementation in Tensorflow [1] is available online at http://github.com/Ada Comp NUS/qmdp-net.
Open Datasets No We trained a policy using expert trajectories from 10, 000 random environments, 5 trajectories from each environment.
Dataset Splits No The paper mentions training and testing sets, but does not explicitly describe a separate validation set or split percentages for training, validation, and test.
Hardware Specification No The paper discusses simulated tasks and mentions training with TensorFlow, but it does not specify any hardware details such as CPU/GPU models, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper mentions "Our implementation in TensorFlow [1] is available online", but does not specify a version number for TensorFlow or any other software dependencies.
Experiment Setup Yes We define loss as the cross entropy between predicted and demonstrated action sequences and use RMSProp [35] for training. ... We used K = 20 . . . 116 depending on the problem size. We were able to transfer policies to larger environments by increasing K up to 450 when executing the policy.