Provably Safe Neural Network Controllers via Differential Dynamic Logic

Authors: Samuel Teuber, Stefan Mitsch, André Platzer

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our evaluation we demonstrate the versatility of Ver SAILLE and Mosaic: We prove infinite-time safety on the classical Vertical Airborne Collision Avoidance NNCS verification benchmark for some scenarios while (exhaustively) enumerating counterexample regions in unsafe scenarios. We also show that our approach significantly outperforms the State-of-the-Art tools in closed-loop NNV.
Researcher Affiliation Academia 1 Karlsruhe Institute of Technology 2 De Paul University 3 Carnegie Mellon University
Pseudocode Yes Algorithm 1: Verification of nonlinear queries on piece-wise linear NNs: ENUM internally uses off-the-shelf open-loop NNV tools for the verification of linear normalized queries on NNs.
Open Source Code Yes We implemented Mosaic for Re LU NNs in a new Julia [16] tool called N3V based on the software packages nnenum [10,12], Pico SAT [17,19] and Z3 [32,52]. We provide an artifact for the N3V tool... All code and data can be found on Git Hub: https://github.com/ samysweb/NCube V. We also provide an archived version of our artifact with DOI [91].
Open Datasets Yes The common NNCS safety benchmark of Adaptive Cruise Control [22,39,51] will serve as running example to demonstrate the introduced concepts.
Dataset Splits No The paper mentions training and testing but does not explicitly describe a validation dataset split or methodology.
Hardware Specification Yes We provide wall-clock times on an AMD Ryzen 7 PRO 5850U CPU (N3V is sequential; nnenum uses multithreading).
Software Dependencies Yes We implemented Mosaic for Re LU NNs in a new Julia [16] tool called N3V based on the software packages nnenum [10,12], Pico SAT [17,19] and Z3 [32,52]. The artifact is equipped with scripts for building the tool (./build.sh <julia 1.10 path>)
Experiment Setup Yes The analyses took 47 to 300 seconds depending on the NN and approximation. The runtimes show mixed results for tighter approximations: While tighter approximations (i.e. a higher N) sometimes improves performance (e.g. for ACC Large retrained), it can also harm performance (e.g. as seen for ACC Large). We suspected that this is a combination of two factors. First, finer approximations yield a larger number of queries which may increase the overall overhead. Secondly, our adjustments to OVERT s approximation using approximation of piece-wise linearities (see Appendix B.1) may in some cases worsen the approximation in comparison to a lower N. We leave a more fine-grained analysis of approximation techniques to future work and focus our analysis on approximations with N = 1. For our verification, we set T = 0.1 (note that this is a bound on the frequency of control decisions, not a time horizon) and A = B = 100.