SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning

Authors: Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson2970-2977

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.
Researcher Affiliation Collaboration 1Auburn University, Auburn, AL, USA 2Maana Inc., Bellevue, WA, USA daoming.lyu@auburn.edu, fyang@maana.io, boliu@auburn.edu, sgustafson@maana.io
Pseudocode Yes Algorithm 1 SDRL Planning and Learning Loop
Open Source Code No The paper does not provide any link or explicit statement about the availability of its source code.
Open Datasets Yes We use Taxi domain to demonstrate the behavior of intrinsically motivated planning, and on Montezuma s Revenge for interpretability and data-efficiency.
Dataset Splits No The paper describes experimental setups for RL environments but does not provide specific training/validation/test dataset splits, as is common for static datasets in supervised learning.
Hardware Specification No The paper mentions 'We thank the donation of GPU card from NVIDIA Corporation.' but does not specify the exact model of the GPU or other hardware components used for experiments.
Software Dependencies No The paper mentions software like CPLUS2ASP, CLINGO, and Arcade Learning Environment (ALE) but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Our experiment setup follows the DQN controller architecture (Kulkarni et al. 2016) with double-Q learning (Van Hasselt, Guez, and Silver 2016) and prioritized experience replay (Schaul et al. 2015). The architecture of the deep neural networks is shown in Table 1. The intrinsic reward follows (3) with φ = 1 and r = 1 when the agent loses its life. Extrinsic reward follows (4) where ψ = 100 and define r(s, g) = 10 for ϵ > 0.9 to encourage shorter plan.