QMDP-Net: Deep Learning for Planning under Partial Observability
Authors: Peter Karkus, David Hsu, Wee Sun Lee
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning. |
| Researcher Affiliation | Academia | Peter Karkus1,2 David Hsu1,2 Wee Sun Lee2 1NUS Graduate School for Integrative Sciences and Engineering 2School of Computing National University of Singapore {karkus, dyhsu, leews}@comp.nus.edu.sg |
| Pseudocode | No | The paper describes the architecture and algorithms in text and figures, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation in Tensorflow [1] is available online at http://github.com/Ada Comp NUS/qmdp-net. |
| Open Datasets | No | We trained a policy using expert trajectories from 10, 000 random environments, 5 trajectories from each environment. |
| Dataset Splits | No | The paper mentions training and testing sets, but does not explicitly describe a separate validation set or split percentages for training, validation, and test. |
| Hardware Specification | No | The paper discusses simulated tasks and mentions training with TensorFlow, but it does not specify any hardware details such as CPU/GPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper mentions "Our implementation in TensorFlow [1] is available online", but does not specify a version number for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | We define loss as the cross entropy between predicted and demonstrated action sequences and use RMSProp [35] for training. ... We used K = 20 . . . 116 depending on the problem size. We were able to transfer policies to larger environments by increasing K up to 450 when executing the policy. |