Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Verification of Non-Linear Specifications for Neural Networks
Authors: Chongli Qin, Krishnamurthy (Dj) Dvijotham, Brendan O'Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation shows that our method is able to effectively verify these speciļ¬cations. Moreover, our evaluation exposes the failure modes in models which cannot be veriļ¬ed to satisfy these speciļ¬cations. Thus, emphasizing the importance of training models not just to ļ¬t training data but also to be consistent with speciļ¬cations. 4 EXPERIMENTS We have proposed a novel set of speciļ¬cations to be veriļ¬ed as well as new veriļ¬cation algorithms that can verify whether these speciļ¬cations are satisļ¬ed by neural networks. In order to validate our contributions experimentally, we perform two sets of experiments: |
| Researcher Affiliation | Industry | Deep Mind London, N1C 4AG, UK correspondence: EMAIL |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We study the semantic distance speciļ¬cation (1) in the context of the CIFAR-10 dataset. We deļ¬ne the distances d(i, j) between labels as their distance according to Wordnet (Miller, 1995) (the full distance matrix used is shown in Appendix F). [...] MNIST and CIFAR-10: For these datasets both models was trained to maximize the log likelihood of true label predictions while being robust to adversarial examples via the method described in (Wong & Kolter, 2018). The training for model A places a heavier loss than model B when robustness measures are violated. Mujoco: To test the energy speciļ¬cation, we used the Mujoco physics engine (Todorov et al., 2012) to create a simulation of a simple pendulum with damping friction. |
| Dataset Splits | No | The paper mentions a test set for the Mujoco dataset ('27000 was set aside as test set'), but it does not specify a validation set split or other dataset split details for any of the datasets used. |
| Hardware Specification | No | The paper mentions 'our desktop machine (with 1 GPU and 8G of memory)' but does not provide specific GPU or CPU models, or further detailed hardware specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software tools like 'GLOP as the LP solver' and 'CVXPY' but does not specify their version numbers, which are necessary for reproducible descriptions of software dependencies. |
| Experiment Setup | Yes | 4.2 EXPERIMENTAL SETUP For each of the speciļ¬cations we trained two networks (referred to as model A and B in the following) that satisfy our speciļ¬cation to varying degrees. [...] Pendulum: Model A We train with an ā1 loss and energy loss on the next state prediction, the exact loss we impose this model is (we denote (w T , h T , sĻT ) as ground truth state): l(f) = f(w, h, sĻ) (w T , h T , sĻT ) | {z } ā1 loss + |E(f(w, h, sĻ)) E((w T , h T , sĻT ))| | {z } energy difference loss Re LU(E(f(w, h, sĻ)) E(w, h, sĻ)) | {z } increase in energy loss [...] Cifar 10: Model A We use a network that is veriļ¬ably robust to adversarial pertubations of size 8/255 (where 255 is the range of pixel values) on 24.61% of the test examples, with respect to the standard speciļ¬cation that the output of the network should remain invariant to adversarial perturbations of the input. The network consists of 4 convolutional layers and 3 linear layers in total 860000 paramters. |