Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Formal Explanations of Neural Network Policies for Planning
Authors: Renee Selvey, Alban Grastien, Sylvie Thiébaux
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experimental results of our implementation of this approach for ASNet policies for classical planning domains. |
| Researcher Affiliation | Academia | 1School of Computing, The Australian National University 2LAAS-CNRS, ANITI, Universit e de Toulouse |
| Pseudocode | Yes | Algorithm 1 Computing a minimal explanation for a sequence of decisions. |
| Open Source Code | Yes | For reproducibility, our repository https://github.com/Renee Selvey/policy-explanations provides our algorithm implementation, benchmarks used, learnt policies, and the scripts to learn them and run the experiments. |
| Open Datasets | Yes | We took all deterministic domains and training instances from the code distributions of [Toyer et al., 2020] and [Steinmetz et al., 2022]. |
| Dataset Splits | No | The paper mentions generating problems for evaluation but does not specify a validation split or its details for the experimental data used in this paper. |
| Hardware Specification | Yes | All experiments were run on a machine with an AMD Ryzen Threadripper 3990X CPU, with 64 cores/128 threads, a clock speed of 2.9 GHz base, 4.3 GHz max boost, and 128 GB of memory of which we used 64 GB. |
| Software Dependencies | Yes | Gurobi version 9.1.2 is the MIP solver used for the experiments. |
| Experiment Setup | Yes | To ensure the model is accurate enough for our experiments, we set the integer feasibility tolerance (Int Feas Tol) to 10 9 and the error for function approximations (Func Piece Error) to 10 6. ... Each explanation problem was run with a time limit of 3h, except for Gripper for which the timeout was 4h. |