reproducibilityindex.ai

SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning

Authors: Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson2970-2977

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results validate the interpretability of subtasks, along with improved data efﬁciency compared with state-of-the-art approaches.
Researcher Affiliation	Collaboration	1Auburn University, Auburn, AL, USA 2Maana Inc., Bellevue, WA, USA daoming.lyu@auburn.edu, fyang@maana.io, boliu@auburn.edu, sgustafson@maana.io
Pseudocode	Yes	Algorithm 1 SDRL Planning and Learning Loop
Open Source Code	No	The paper does not provide any link or explicit statement about the availability of its source code.
Open Datasets	Yes	We use Taxi domain to demonstrate the behavior of intrinsically motivated planning, and on Montezuma s Revenge for interpretability and data-efﬁciency.
Dataset Splits	No	The paper describes experimental setups for RL environments but does not provide specific training/validation/test dataset splits, as is common for static datasets in supervised learning.
Hardware Specification	No	The paper mentions 'We thank the donation of GPU card from NVIDIA Corporation.' but does not specify the exact model of the GPU or other hardware components used for experiments.
Software Dependencies	No	The paper mentions software like CPLUS2ASP, CLINGO, and Arcade Learning Environment (ALE) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	Our experiment setup follows the DQN controller architecture (Kulkarni et al. 2016) with double-Q learning (Van Hasselt, Guez, and Silver 2016) and prioritized experience replay (Schaul et al. 2015). The architecture of the deep neural networks is shown in Table 1. The intrinsic reward follows (3) with φ = 1 and r = 1 when the agent loses its life. Extrinsic reward follows (4) where ψ = 100 and deﬁne r(s, g) = 10 for ϵ > 0.9 to encourage shorter plan.