Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Set-Based Training for Neural Network Verification

Authors: Lukas Koller, Tobias Ladner, Matthias Althoff

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive evaluation demonstrates that set-based training produces robust neural networks with competitive performance, which can be verified using fast (polynomial-time) verification algorithms due to the reduced output set. An extensive empirical evaluation in which we demonstrate the competitive performance of our set-based training and compare it with state-of-the-art robust training approaches. Moreover, we include large-scale ablation studies to justify our design choices.
Researcher Affiliation	Academia	Lukas Koller EMAIL Technical University of Munich; Tobias Ladner EMAIL Technical University of Munich; Matthias Althoff EMAIL Professorship for Cyber-Physical Systems Technical University of Munich
Pseudocode	Yes	Algorithm 1: Image Enclosure of a Nonlinear Layer. Algorithm 2: Set-based Training Iteration.
Open Source Code	No	We use the MATLAB toolbox CORA (Althoff, 2015) to implement set-based training. This statement indicates the use of a third-party toolbox, not the release of the authors' own implementation code for their methodology. No explicit statement of code release or repository link is found.
Open Datasets	Yes	We train a 6-layer convolutional neural network on Mnist (Le Cun et al., 2010), Svhn (Netzer et al., 2011), Cifar-10 (Krizhevsky, 2009), and Tiny Image Net (Le & Yang, 2015).
Dataset Splits	Yes	We use the canonical split of training and test data for each dataset and the entire test data for evaluation; because test labels are not available for Tiny Image Net, we follow (Müller et al., 2023) and use the validation set for testing.
Hardware Specification	Yes	Our experiments were run on a server with 2 AMD EPYC 7763 (64 cores/128 threads), 2TB RAM, and a NVIDIA A100 40GB GPU.
Software Dependencies	No	The paper mentions using the 'MATLAB toolbox CORA (Althoff, 2015)' and 'Adam optimizer (Kingma & Ba, 2015)' but does not specify version numbers for these or any other software components.
Experiment Setup	Yes	Table 6: Training hyperparameters. #Epochs Dataset η ϵ τ Batch Size (warm-up / ramp-up) Decay ...; The weights and biases are initialized as in (Shi et al., 2021).; We use Adam optimizer (Kingma & Ba, 2015) with the recommended hyperparameters.; For any PGD during training (...) we used the settings from (Müller et al., 2023): 8 iterations with an initial step size 0.5, which is decayed twice by 0.1 at iterations 4 and 7. All PGD attacks for testing are computed with 40 iterations of step size 0.01.