Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Verification of Non-Linear Specifications for Neural Networks
Authors: Chongli Qin, Krishnamurthy (Dj) Dvijotham, Brendan O'Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation shows that our method is able to effectively verify these speciļ¬cations. Moreover, our evaluation exposes the failure modes in models which cannot be veriļ¬ed to satisfy these speciļ¬cations. Thus, emphasizing the importance of training models not just to ļ¬t training data but also to be consistent with speciļ¬cations. 4 EXPERIMENTS We have proposed a novel set of speciļ¬cations to be veriļ¬ed as well as new veriļ¬cation algorithms that can verify whether these speciļ¬cations are satisļ¬ed by neural networks. In order to validate our contributions experimentally, we perform two sets of experiments: |
| Researcher Affiliation | Industry | Deep Mind London, N1C 4AG, UK correspondence: EMAIL |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We study the semantic distance speciļ¬cation (1) in the context of the CIFAR-10 dataset. We deļ¬ne the distances d(i, j) between labels as their distance according to Wordnet (Miller, 1995) (the full distance matrix used is shown in Appendix F). [...] MNIST and CIFAR-10: For these datasets both models was trained to maximize the log likelihood of true label predictions while being robust to adversarial examples via the method described in (Wong & Kolter, 2018). The training for model A places a heavier loss than model B when robustness measures are violated. Mujoco: To test the energy speciļ¬cation, we used the Mujoco physics engine (Todorov et al., 2012) to create a simulation of a simple pendulum with damping friction. |
| Dataset Splits | No | The paper mentions a test set for the Mujoco dataset ('27000 was set aside as test set'), but it does not specify a validation set split or other dataset split details for any of the datasets used. |
| Hardware Specification | No | The paper mentions 'our desktop machine (with 1 GPU and 8G of memory)' but does not provide specific GPU or CPU models, or further detailed hardware specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software tools like 'GLOP as the LP solver' and 'CVXPY' but does not specify their version numbers, which are necessary for reproducible descriptions of software dependencies. |
| Experiment Setup | Yes | 4.2 EXPERIMENTAL SETUP For each of the speciļ¬cations we trained two networks (referred to as model A and B in the following) that satisfy our speciļ¬cation to varying degrees. [...] Pendulum: Model A We train with an ā1 loss and energy loss on the next state prediction, the exact loss we impose this model is (we denote (w T , h T , sĻT ) as ground truth state): l(f) = f(w, h, sĻ) (w T , h T , sĻT ) | {z } ā1 loss + |E(f(w, h, sĻ)) E((w T , h T , sĻT ))| | {z } energy difference loss Re LU(E(f(w, h, sĻ)) E(w, h, sĻ)) | {z } increase in energy loss [...] Cifar 10: Model A We use a network that is veriļ¬ably robust to adversarial pertubations of size 8/255 (where 255 is the range of pixel values) on 24.61% of the test examples, with respect to the standard speciļ¬cation that the output of the network should remain invariant to adversarial perturbations of the input. The network consists of 4 convolutional layers and 3 linear layers in total 860000 paramters. |