Neural Architecture Design and Robustness: A Dataset

Authors: Steffen Jung, Jovita Lukasik, Margret Keuper

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate all these networks on a range of common adversarial attacks and corruption types and introduce a database on neural architecture design and robustness evaluations. We further present three exemplary use cases of this dataset, in which we (i) benchmark robustness measurements based on Jacobian and Hessian matrices for their robustness predictability, (ii) perform neural architecture search on robust accuracies, and (iii) provide an initial analysis of how architectural design choices affect robustness.
Researcher Affiliation Academia 1 Max Planck Institute for Informatics, Saarland Informatics Campus {steffen.jung,jlukasik,keuper}@mpi-inf.mpg.de 2 University of Siegen
Pseudocode Yes Algorithm 1: Robustness Dataset Gathering
Open Source Code Yes Code and data is available at http://robustness.vision/.
Open Datasets Yes Each architecture is trained on three different image datasets for 200 epochs: CIFAR-10 (Krizhevsky, 2009), CIFAR-100 (Krizhevsky, 2009) and Image Net16-120 (Chrabaszcz et al., 2017).
Dataset Splits No The paper explicitly mentions using 'test splits' but does not specify details about validation splits for reproducing experiments.
Hardware Specification Yes These clusters are comprised of either (i) compute nodes with Nvidia A100 GPUs, 512 GB RAM, and Intel Xeon Ice Lake-SP processors, (ii) compute nodes with NVIDIA Quadro RTX 8000 GPUs, 1024 GB RAM, and AMD EPYC 7502P processors, (iii) NVIDIA Tesla A100 GPUs, 2048 GB RAM, Intel Xeon Platinum 8360Y processors, and (iv) NVIDIA Tesla A40 GPUs, 2048 GB RAM, Intel Xeon Platinum 8360Y processors.
Software Dependencies No The paper mentions tools like "Foolbox" and methods like "Chatzimichailidis et al., 2019" for computation, but does not provide specific version numbers for these software components or other dependencies.
Experiment Setup Yes In the case of architectures trained for NAS-Bench-201, this is cross entropy (CE). Since attacks via FGSM can be evaluated fairly efficiently, we evaluate all architectures for ϵ EF GSM = {.1, .5, 1, 2, . . . , 8, 255}/255, so for a total of |EF GSM| = 11 times for each architecture. We use Foolbox (Rauber et al., 2017) to perform the attacks, and collect (a) accuracy, (b) average prediction confidences, as well as (c) confusion matrices for each network and ϵ combination. ... Therefore, we find it sufficient to evaluate PGD for ϵ EP GD = {.1, .5, 1, 2, 3, 4, 8}/255, so for a total of |EP GD| = 7 times for each architecture. As for FGSM, we use Foolbox (Rauber et al., 2017) to perform the attacks using their L PGD implementation and keep the default settings, which are α = 0.01/0.3 for 40 attack iterations. ... We kept the default number of attack iterations that is 100. ... We kept the default number of search iterations at 5 000.