Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction

Authors: Quentin Delfosse, Hikaru Shindo, Devendra Dhami, Kristian Kersting

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluation demonstrates that NUDGE agents can induce interpretable and explainable policies while outperforming purely neural ones and showing good flexibility to environments of different initial states and problem sizes. 4 Experimental Evaluation We here compare neural agents performances to NUDGE ones, showcase NUDGE interpretable policies and its ability to report the importance of each input on their decisions, i.e. explainable logic policies.
Researcher Affiliation Academia Quentin Delfosse Technical University of Darmstadt National Research Center for Applied Cybersecurity quentin.delfosse@tu-darmstadt.de Hikaru Shindo Technical University of Darmstadt hikaru.shindo@tu-darmstadt.de Devendra Singh Dhami Eindhoven University of Technology Hessian Center for AI (hessian.AI) d.s.dhami@tue.nl Kristian Kersting Technical University Darmstadt Hessian Center for AI (hessian.AI) German Research Center for AI (DFKI) kersting@cs.tu-darmstadt.de
Pseudocode Yes A.1 Algorithm of Neurally-Guided Symbolic Abstraction We show the algorithm of neurally-guided symbolic abstraction in Algorithm 1. Algorithm 1 Neurally-Guided Symbolic Abstraction
Open Source Code Yes 3Code publicly available: https://github.com/k4ntz/NUDGE.
Open Datasets Yes We make use of the Object-Centric Atari library [Delfosse et al., 2023]. As Atari games do not embed logic challenges, but are rather desgined to test the reflexes of human players, we also created 3 logic-oriented environments. We thus have modified environments from the Procgen [Mohanty et al., 2020] environments that are open-sourced along with our evaluation to have object-centric representations.
Dataset Splits No The paper mentions training steps and total training duration but does not provide specific train/validation/test dataset splits.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments.
Software Dependencies No The paper mentions 'Adam optimizer' and 'Pytorch' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes D.1 Hyperparameters We here provide the hyperparameters used in our experiments. We set the clip parameter ϵclip = 0.2, the discount factor γ = 0.99. We use the Adam optimizer, with 1e 3 as actor learning rate, 3e 4 as critic learning rate. The episode length is 500 timesteps. The policy is updated every 1000 steps We train every algorithm for 800k steps on each environment, apart from neural PPO, that needed 5M steps on Loot. We use an epsilon greedy strategy with ϵ = max(e episode /500, 0.02).