Relational Verification Leaps Forward with RABBit

Authors: Tarun Suresh, Debangshu Banerjee, Gagandeep Singh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments on popular datasets and various DNNs (standard and robustly trained) to showcase the precision improvement over the current SOTA baselines. We evaluate the effectiveness of RABBit on multiple relational properties such UAP accuracy ( Table 1) and top-k accuracy (Appendix Table 5), DNNs, and datasets.
Researcher Affiliation Collaboration Tarun Suresh1 Debangshu Banerjee1 Gagandeep Singh1,2 1University of Illinois Urbana-Champaign, 2VMware Research
Pseudocode Yes In this section, we detail the algorithm (Algo. 1) that combines the results from strong bounding and strong branching to formulate the MILP. Algorithm 1 RABBit
Open Source Code Yes Code is at this URL. We provide the code to replicate the main results of this paper.
Open Datasets Yes Table 1: RABBit Efficacy Analysis for Worst-Case UAP Accuracy lists CIFAR10 and MNIST. Our experiments utilize publicly available pre-trained DNNs sourced from the CROWN repository [Zhang et al., 2020], , -CROWN repository [Wang et al., 2021], and ERAN repository [Singh et al., 2019b].
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits, only mentioning the use of existing datasets like CIFAR10 and MNIST. The evaluation is focused on verification of properties of already trained networks.
Hardware Specification Yes We use a single NVIDIA A100-PCI GPU with 40 GB RAM for bound refinement and an Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz with 64 GB RAM for MILP optimization.
Software Dependencies Yes We implemented our method in Python with Pytorch V1.11 on top of SOTA complete non-relational verifier , -CROWN [Wang et al., 2021]. We used Gurobi V11.0 as the off-the-shelf MILP solver.
Experiment Setup Yes For both strong bounding and strong branching, we use Adam [Kingma and Ba, 2014] for parameter learning and run it for 20 iterations on each subproblem. We set the value of kt = 10 for CIFAR-10 and kt = 24 for MNIST networks respectively. For any relational property with k executions, we give an overall timeout of k minutes (averaging 1 minute/execution) to RABBit and all baselines. Each MILP instance gets a timeout of 10 minutes.