Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction
Authors: Quentin Delfosse, Hikaru Shindo, Devendra Dhami, Kristian Kersting
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation demonstrates that NUDGE agents can induce interpretable and explainable policies while outperforming purely neural ones and showing good flexibility to environments of different initial states and problem sizes. 4 Experimental Evaluation We here compare neural agents performances to NUDGE ones, showcase NUDGE interpretable policies and its ability to report the importance of each input on their decisions, i.e. explainable logic policies. |
| Researcher Affiliation | Academia | Quentin Delfosse Technical University of Darmstadt National Research Center for Applied Cybersecurity quentin.delfosse@tu-darmstadt.de Hikaru Shindo Technical University of Darmstadt hikaru.shindo@tu-darmstadt.de Devendra Singh Dhami Eindhoven University of Technology Hessian Center for AI (hessian.AI) d.s.dhami@tue.nl Kristian Kersting Technical University Darmstadt Hessian Center for AI (hessian.AI) German Research Center for AI (DFKI) kersting@cs.tu-darmstadt.de |
| Pseudocode | Yes | A.1 Algorithm of Neurally-Guided Symbolic Abstraction We show the algorithm of neurally-guided symbolic abstraction in Algorithm 1. Algorithm 1 Neurally-Guided Symbolic Abstraction |
| Open Source Code | Yes | 3Code publicly available: https://github.com/k4ntz/NUDGE. |
| Open Datasets | Yes | We make use of the Object-Centric Atari library [Delfosse et al., 2023]. As Atari games do not embed logic challenges, but are rather desgined to test the reflexes of human players, we also created 3 logic-oriented environments. We thus have modified environments from the Procgen [Mohanty et al., 2020] environments that are open-sourced along with our evaluation to have object-centric representations. |
| Dataset Splits | No | The paper mentions training steps and total training duration but does not provide specific train/validation/test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Pytorch' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | D.1 Hyperparameters We here provide the hyperparameters used in our experiments. We set the clip parameter ϵclip = 0.2, the discount factor γ = 0.99. We use the Adam optimizer, with 1e 3 as actor learning rate, 3e 4 as critic learning rate. The episode length is 500 timesteps. The policy is updated every 1000 steps We train every algorithm for 800k steps on each environment, apart from neural PPO, that needed 5M steps on Loot. We use an epsilon greedy strategy with ϵ = max(e episode /500, 0.02). |