Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
QMDP-Net: Deep Learning for Planning under Partial Observability
Authors: Peter Karkus, David Hsu, Wee Sun Lee
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning. |
| Researcher Affiliation | Academia | Peter Karkus1,2 David Hsu1,2 Wee Sun Lee2 1NUS Graduate School for Integrative Sciences and Engineering 2School of Computing National University of Singapore EMAIL |
| Pseudocode | No | The paper describes the architecture and algorithms in text and figures, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation in Tensorflow [1] is available online at http://github.com/Ada Comp NUS/qmdp-net. |
| Open Datasets | No | We trained a policy using expert trajectories from 10, 000 random environments, 5 trajectories from each environment. |
| Dataset Splits | No | The paper mentions training and testing sets, but does not explicitly describe a separate validation set or split percentages for training, validation, and test. |
| Hardware Specification | No | The paper discusses simulated tasks and mentions training with TensorFlow, but it does not specify any hardware details such as CPU/GPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper mentions "Our implementation in TensorFlow [1] is available online", but does not specify a version number for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | We define loss as the cross entropy between predicted and demonstrated action sequences and use RMSProp [35] for training. ... We used K = 20 . . . 116 depending on the problem size. We were able to transfer policies to larger environments by increasing K up to 450 when executing the policy. |