Neuro-Symbolic Verification of Deep Neural Networks

Authors: Xuan Xie, Kristian Kersting, Daniel Neider

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experimental evaluation, we have considered two widely used datasets: 1. The MNIST dataset... 2. The German Traffic Sign Recognition Benchmark (GTSRB)... Figure 1 depicts the results of our experiments in terms of the accumulated average runtimes.
Researcher Affiliation Academia 1Max Planck Institute for Software Systems, Kaiserslautern, Germany 2Computer Science Department and Centre for Cognitive Science, TU Darmstadt, Germany 3Hessian Center for AI (hessian.AI), Germany 4Carl von Ossietzky University of Oldenburg, Germany
Pseudocode No The paper describes the proposed framework and formalisms using natural language and mathematical notation, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and all experimental data can be found at https://github.com/Lebron X/Neuro-Symbolic-Verification.
Open Datasets Yes The MNIST dataset [Le Cun et al., 2010], containing 60,000 training images and 10.000 test images of handwritten digits. The German Traffic Sign Recognition Benchmark (GTSRB) [Stallkamp et al., 2011], containing 39,209 training images and 12,630 test images with 43 types of German traffic signs.
Dataset Splits No The paper specifies the number of training and test images for MNIST and GTSRB datasets but does not explicitly mention a separate validation set or how data was split for validation purposes.
Hardware Specification Yes We have conducted our evaluation on an Intel Core i5-5350U CPU (1.80 GHz) with 8 GB RAM running Mac OS Catalina 10.15.7 with a timeout of 1, 800 s per benchmark.
Software Dependencies No We have implemented a Python3 prototype based on the state-of-the-art deductive verifier Marabou [Katz et al., 2019], named Neuro-Symbolic Verifier (NSV). The paper mentions 'Python3' and 'Marabou' but does not specify their version numbers for reproducibility.
Experiment Setup Yes We have resized the MNIST images for property P2 to 14x14 and the GTSRB images to 16x16 for all properties to keep the verification tractable. For property P2, we have chosen 0.05 <= epsilon <= 0.14 with step size 0.01 and 1 <= delta <= 20 with step size 1. For property P3, we have chosen 0.05 <= epsilon <= 0.14 with step size 0.01. Table 1: Architectures used in our experimental evaluation. Each line lists the number of neurons in the input layer (in), output layer (out), and hidden layers (hid.), respectively. We have repeated all experiments five times with different neural networks (trained using different parameters) and report the average results.