Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adversarial Vulnerability of Randomized Ensembles
Authors: Hassan Dbouk, Naresh Shanbhag
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments across a variety of network architectures, training schemes, datasets, and norms to support our claims, and empirically establish that randomized ensembles are in fact more vulnerable to ℓp-bounded adversarial perturbations than even standard AT models. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, USA. |
| Pseudocode | Yes | Algorithm 1 The ARC Algorithm for BLCs; Algorithm 2 The ARC Algorithm |
| Open Source Code | Yes | Our code can be found at https://github.com/ hsndbk4/ARC. |
| Open Datasets | Yes | datasets (SVHN (Netzer et al., 2011), CIFAR10 (Krizhevsky et al., 2009), CIFAR-100, and Image Net (Krizhevsky et al., 2012)) |
| Dataset Splits | No | The paper mentions using 'early stopping to avoid robust over-fitting' which implies the use of a validation set, but it does not specify the exact size, percentage, or method for creating the validation split. It primarily details test set usage. |
| Hardware Specification | Yes | A single workstation with two NVIDIA Tesla P100 GPUs is used for running all the training experiments. We use a workstation with a single NVIDIA 1080 Ti GPU and iterate over the testset with a mini-batch size of 256 for all evaluations. |
| Software Dependencies | No | The paper mentions software like PyTorch (Paszke et al., 2017) and Foolbox (Rauber et al., 2017), but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | SVHN: Models are trained for 200 epochs, using a PGD adversary with K = 7 iterations with: ϵ = 8/255 and η = 2/255 for ℓ AT, and ϵ = 128/255 and η = 32/255 for ℓ2 AT. We use stochastic gradient descent (SGD) with momentum (0.9), 128 mini-batch size, and a step-wise learning rate decay set initially at 0.1 and divided by 10 at epochs 100 and 150. We employ weight decay of 2 10 4. |