Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VeriX: Towards Verified Explainability of Deep Neural Networks

Authors: Min Wu, Haoze Wu, Clark Barrett

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on image recognition benchmarks and a real-world scenario of autonomous aircraft taxiing. ... We have implemented the VERIX algorithm in Python, using the Marabou neural network verification tool [31] to implement the CHECK sub-procedure of Algorithm 1 (Line 10).
Researcher Affiliation Academia Min Wu Department of Computer Science Stanford University EMAIL Haoze Wu Department of Computer Science Stanford University EMAIL Clark Barrett Department of Computer Science Stanford University EMAIL
Pseudocode Yes Algorithm 1 VERIX (VERIfied e Xplainability)
Open Source Code Yes The VERIX code is available at https://github.com/Neural Network Verification/Veri X.
Open Datasets Yes We trained fully-connected and convolutional networks on the MNIST [34], GTSRB [47], and Taxi Net [29] datasets for classification and regression tasks.
Dataset Splits No The paper mentions using MNIST, GTSRB, and Taxi Net datasets for training and testing, and refers to a 'test set' multiple times. However, it does not explicitly provide details about the training/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined splits with citations) for their experiments.
Hardware Specification Yes Experiments were performed on a workstation equipped with AMD Ryzen 7 5700G CPUs running Fedora 37. ... Experiments were performed on a cluster equipped with Intel Xeon E5-2637 v4 CPUs running Ubuntu 16.04.
Software Dependencies No The paper mentions that the algorithm is implemented in 'Python' and uses 'Marabou neural network verification tool [31]'. It also mentions 'Tensor Flow [1]' and 'Keras [9]', and the 'tensorflow_ranking package'. While it specifies operating systems 'Fedora 37' and 'Ubuntu 16.04', it does not provide specific version numbers for Python, Marabou, TensorFlow, Keras, or the `tensorflow_ranking` package, which are necessary for full reproducibility of software dependencies.
Experiment Setup Yes We set a time limit of 300 seconds for each CHECK call. ... In VERIX, ϵ is set to 5% for MNIST and 0.5% for GTSRB. ... magnitude ϵ is set to 3% across the Dense, Dense (large), CNN models and the MNIST, Taxi Net, GTSRB datasets for sensible comparison. ... The Taxi Net model has a mean absolute error of 0.824 on the test set, with no activation function in the last layer. ... Taxi Net deploys he_uniform as the kernel_initializer parameter in the intermediate dense and convolutional layers for task specific reason.