Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provable Editing of Deep Neural Networks using Parametric Linear Relaxation
Authors: Zhe Tao, Aditya V Thakur
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PREPARED i) using the VNN-COMP benchmarks, ii) by editing CIFAR10 and TINYIMAGENET image-recognition DNNs, and BERT sentiment-classification DNNs for local robustness, and iii) by training a DNN to model a geodynamics process and satisfy physics constraints. |
| Researcher Affiliation | Academia | Zhe Tao University of California, Davis Davis, CA 95616, USA EMAIL Aditya V. Thakur University of California, Davis Davis, CA 95616, USA EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. The methods are described using mathematical definitions and textual explanations. |
| Open Source Code | No | For the implementation of our tool, we could not make the code open access due to ongoing IP restrictions. |
| Open Datasets | Yes | We evaluate PREPARED i) using the VNN-COMP benchmarks, ii) by editing CIFAR10 and TINYIMAGENET image-recognition DNNs, and BERT sentiment-classification DNNs for local robustness, and iii) by training a DNN to model a geodynamics process and satisfy physics constraints. ... The CIFAR10 DNN has 17.2M parameters... The Tiny Image Net DNN has 51.9M parameters... (citing [21] for CIFAR10, [33] for TINYIMAGENET, and [29] for VNN-COMP 22 benchmarks). |
| Dataset Splits | Yes | The validation set is the same 10% of the training set, randomly selected with seed 0. ... We train 2000 epochs and take the epoch with the highest validation accuracy. |
| Hardware Specification | Yes | All experiments were run on a machine with Dual Intel Xeon Silver 4216 Processor 16-Core 2.1GHz with 384 GB of memory, SSD and RTX-A6000 with 48 GB of GPU memory running Ubuntu 20.04. |
| Software Dependencies | No | We have implemented PREPARED in Py Torch [32] and use Gurobi [17] as the LP solver. ... We use verifiers α,β-CROWN [46, 44], MN-Ba B [9], or Deep T [4], depending on the task. ... We use auto_Li RPA [50] or Deep Poly [39] to compute the constant bounds for the input to the first layer to edit. (No explicit version numbers stated directly with the software names). |
| Experiment Setup | Yes | Appendix D.2, D.3, D.4, and D.5 provide detailed hyperparameters for each experiment, including optimizers, learning rates, batch sizes, number of epochs, and layer freezing strategies. |