reproducibilityindex.ai

VeriX: Towards Verified Explainability of Deep Neural Networks

Authors: Min Wu, Haoze Wu, Clark Barrett

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on image recognition benchmarks and a real-world scenario of autonomous aircraft taxiing. ... We have implemented the VERIX algorithm in Python, using the Marabou neural network verification tool [31] to implement the CHECK sub-procedure of Algorithm 1 (Line 10).
Researcher Affiliation	Academia	Min Wu Department of Computer Science Stanford University minwu@cs.stanford.edu Haoze Wu Department of Computer Science Stanford University haozewu@cs.stanford.edu Clark Barrett Department of Computer Science Stanford University barrett@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 VERIX (VERIfied e Xplainability)
Open Source Code	Yes	The VERIX code is available at https://github.com/Neural Network Verification/Veri X.
Open Datasets	Yes	We trained fully-connected and convolutional networks on the MNIST [34], GTSRB [47], and Taxi Net [29] datasets for classification and regression tasks.
Dataset Splits	No	The paper mentions using MNIST, GTSRB, and Taxi Net datasets for training and testing, and refers to a 'test set' multiple times. However, it does not explicitly provide details about the training/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations) for their experiments.
Hardware Specification	Yes	Experiments were performed on a workstation equipped with AMD Ryzen 7 5700G CPUs running Fedora 37. ... Experiments were performed on a cluster equipped with Intel Xeon E5-2637 v4 CPUs running Ubuntu 16.04.
Software Dependencies	No	The paper mentions that the algorithm is implemented in 'Python' and uses 'Marabou neural network verification tool [31]'. It also mentions 'Tensor Flow [1]' and 'Keras [9]', and the 'tensorflow_ranking package'. While it specifies operating systems 'Fedora 37' and 'Ubuntu 16.04', it does not provide specific version numbers for Python, Marabou, TensorFlow, Keras, or the `tensorflow_ranking` package, which are necessary for full reproducibility of software dependencies.
Experiment Setup	Yes	We set a time limit of 300 seconds for each CHECK call. ... In VERIX, ϵ is set to 5% for MNIST and 0.5% for GTSRB. ... magnitude ϵ is set to 3% across the Dense, Dense (large), CNN models and the MNIST, Taxi Net, GTSRB datasets for sensible comparison. ... The Taxi Net model has a mean absolute error of 0.824 on the test set, with no activation function in the last layer. ... Taxi Net deploys he_uniform as the kernel_initializer parameter in the intermediate dense and convolutional layers for task specific reason.