Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scaling the Convex Barrier with Sparse Dual Algorithms

Authors: Alessandro De Palma, Harkirat Singh Behl, Rudy Bunel, Philip H.S. Torr, M. Pawan Kumar

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We alleviate this deﬁciency by presenting two novel dual algorithms: one operates a subgradient method on a small active set of dual variables, the other exploits the sparsity of Frank-Wolfe type optimizers to incur only a linear memory cost. Both methods recover the strengths of the new relaxation: tightness and a linear separation oracle. At the same time, they share the beneﬁts of previous dual approaches for weaker relaxations: massive parallelism, GPU implementation, low cost per iteration and valid bounds at any time. As a consequence, we can obtain better bounds than oﬀ-the-shelf solvers in only a fraction of their running time, attaining signiﬁcant formal veriﬁcation speed-ups. ... Providing a detailed experimental evaluation of the new solver, both for incomplete and complete veriﬁcation.
Researcher Affiliation	Academia	Alessandro De Palma EMAIL Harkirat Singh Behl EMAIL Rudy Bunel EMAIL Philip H.S. Torr EMAIL M. Pawan Kumar EMAIL Department of Engineering Science University of Oxford Oxford OX1 3PJ
Pseudocode	Yes	Algorithm 1 Active Set 1: function activeset compute bounds(Problem (5)) ... Algorithm 2 Saddle Point 1: function saddlepoint compute bounds(Problem (5))
Open Source Code	Yes	Code implementing our algorithms is available as part of the OVAL neural network veriﬁcation framework: https://github.com/oval-group/oval-bab.
Open Datasets	Yes	on the CIFAR-10 test set (Krizhevsky and Hinton, 2009).
Dataset Splits	Yes	we restricted the experiments to the ﬁrst 2567 CIFAR-10 test set images for the experiments on the SGD-trained network (Figures 1, 2), and to the ﬁrst 4129 images for the network trained via the method by Madry et al. (2018) (Figures 9, 10). ... We restricted the original data set to 100 properties per network so as to mirror the setup of the recent VNN-COMP competition (VNN-COMP, 2020).
Hardware Specification	Yes	Unless otherwise stated, experiments were run under the following setup: Ubuntu 16.04.2 LTS, on a single Nvidia Titan Xp GPU, except those based on Gurobi and the CPU version of our solvers. The latter were run on i7-6850K CPUs, utilising 4 cores for the incomplete veriﬁcation experiments, and 6 cores for the more demanding complete veriﬁcation setting.
Software Dependencies	No	The implementation of our algorithms, available at https://github.com/oval-group/oval-bab as part of the OVAL neural network veriﬁcation framework, is based on Pytorch (Paszke et al., 2017). ... Gurobi Optimization, 2020.
Experiment Setup	Yes	For all supergradient-based methods (Big-M, Active Set), we employed the Adam update rule (Kingma and Ba, 2015), which showed stronger empirical convergence. For Big-M, replicating the ﬁndings by Bunel et al. (2020a) on their supergradient method, we linearly decrease the step size from 10 2 to 10 4. Active Set is initialized with 500 Big-M iterations, after which the step size is reset and linearly scaled from 10 3 to 10 6. We found the addition of variables to the active set to be eﬀective before convergence: we add variables every 450 iterations, without re-scaling the step size again. Every addition consists of 2 new variables (see algorithm 1) and we cap the maximum number of cuts to 7. This was found to be a good compromise between fast bound improvement and computational cost. For Saddle Point, we use 1/(t + 10) as step size to stay closer to the initialization points. The primal initialization algorithm (see appendix C.2) is run for 100 steps on the Big-M variables, with step size linearly decreased from 10 2 to 10 5.