VeriX: Towards Verified Explainability of Deep Neural Networks
Authors: Min Wu, Haoze Wu, Clark Barrett
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on image recognition benchmarks and a real-world scenario of autonomous aircraft taxiing. ... We have implemented the VERIX algorithm in Python, using the Marabou neural network verification tool [31] to implement the CHECK sub-procedure of Algorithm 1 (Line 10). |
| Researcher Affiliation | Academia | Min Wu Department of Computer Science Stanford University minwu@cs.stanford.edu Haoze Wu Department of Computer Science Stanford University haozewu@cs.stanford.edu Clark Barrett Department of Computer Science Stanford University barrett@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 VERIX (VERIfied e Xplainability) |
| Open Source Code | Yes | The VERIX code is available at https://github.com/Neural Network Verification/Veri X. |
| Open Datasets | Yes | We trained fully-connected and convolutional networks on the MNIST [34], GTSRB [47], and Taxi Net [29] datasets for classification and regression tasks. |
| Dataset Splits | No | The paper mentions using MNIST, GTSRB, and Taxi Net datasets for training and testing, and refers to a 'test set' multiple times. However, it does not explicitly provide details about the training/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations) for their experiments. |
| Hardware Specification | Yes | Experiments were performed on a workstation equipped with AMD Ryzen 7 5700G CPUs running Fedora 37. ... Experiments were performed on a cluster equipped with Intel Xeon E5-2637 v4 CPUs running Ubuntu 16.04. |
| Software Dependencies | No | The paper mentions that the algorithm is implemented in 'Python' and uses 'Marabou neural network verification tool [31]'. It also mentions 'Tensor Flow [1]' and 'Keras [9]', and the 'tensorflow_ranking package'. While it specifies operating systems 'Fedora 37' and 'Ubuntu 16.04', it does not provide specific version numbers for Python, Marabou, TensorFlow, Keras, or the `tensorflow_ranking` package, which are necessary for full reproducibility of software dependencies. |
| Experiment Setup | Yes | We set a time limit of 300 seconds for each CHECK call. ... In VERIX, ϵ is set to 5% for MNIST and 0.5% for GTSRB. ... magnitude ϵ is set to 3% across the Dense, Dense (large), CNN models and the MNIST, Taxi Net, GTSRB datasets for sensible comparison. ... The Taxi Net model has a mean absolute error of 0.824 on the test set, with no activation function in the last layer. ... Taxi Net deploys he_uniform as the kernel_initializer parameter in the intermediate dense and convolutional layers for task specific reason. |