reproducibilityindex.ai

Provably Bounding Neural Network Preimages

Authors: Suhas Kotha, Christopher Brix, J. Zico Kolter, Krishnamurthy Dvijotham, Huan Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate INVPROP on three different benchmarks. Across benchmarks, we find orders of magnitude improvement over prior work, and our methods work even for relatively large networks (167k neurons) and high dimensionality inputs (8112 dimensions). All implementation details are described in Appendix F and the utilized hardware is described in Appendix G.
Researcher Affiliation	Collaboration	Suhas Kotha Carnegie Mellon suhask@andrew.cmu.edu Christopher Brix RWTH Aachen brix@cs.rwth-aachen.de Zico Kolter Carnegie Mellon Bosch Center for AI zkolter@cs.cmu.edu Krishnamurthy (Dj) Dvijotham Google Deep Mind dvijothamcs@gmail.com Huan Zhang UIUC huan@huan-zhang.com
Pseudocode	Yes	Algorithm 1 INVPROP. Can be applied to branches for non-convex overapproximations. Lower and upper bound of all neurons and all constraints are optimized using distinct α and γ values in g.
Open Source Code	Yes	Instructions for reproducing our results are available at https://github.com/kothasuhas/verify-input
Open Datasets	Yes	MNIST Yang et al. [2021b] provides four networks with Re LU activation functions and dense linear layers of 2/6/6/9 layers of 100/100/200/200 neurons, each trained on MNIST. For each network, 50 inputs were tested for local robustness.
Dataset Splits	No	The paper mentions using specific benchmarks (MNIST, YOLO, Double Integrator, 6D Quadrotor) but does not provide explicit training, validation, and test split percentages or counts for these datasets within the text. It refers to established benchmarks and problem setups.
Hardware Specification	Yes	For the control benchmark, all experiments were performed on a Dual Xeon Gold 6138. For the OOD benchmark, all experiments were performed on an Intel Xeon Platinum 8160 processor using 8 cores and 40GB RAM, as well as a V100-SXM2 GPU. For the MNIST and YOLO robostness verification benchmarks, an AWS instance of type g5.2xlarge was used, with is equipped with a AMD EPYC 7R32 processor with 8 cores, 32GB RAM, as well as an A10G GPU.
Software Dependencies	No	The paper mentions 'PyTorch' as a library but does not specify a version number or other software dependencies with their versions.
Experiment Setup	Yes	We initialize all α with 0.5 and all γ with 0.025. The optimization is performed for all lower and upper bounds of all neurons in each layer in parallel, starting with the last layer and moving forward to the input layer. The improvements on c x are measured every 10th iteration. Before doing so, the bounds of the hyperplanes are tightened for 10 extra steps to improve their precision. All cutting hyperplanes are evenly distributed to maximize their information gain. For 40 hyperplanes in a 2D input space, we rotate each plane by 9 . Both α,β-CROWN and α,β-CROWN+INVPROP were run with a per-input timeout of 5 minutes.