reproducibilityindex.ai

Boosting Randomized Smoothing with Variance Reduced Classifiers

Authors: Miklós Z. Horváth, Mark Niklas Mueller, Marc Fischer, Martin Vechev

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show that ensembles of only 3 to 10 classiﬁers consistently improve on their strongest constituting model with respect to their average certiﬁed radius (ACR) by 5% to 21% on both CIFAR10 and Image Net, achieving a new state-of-the-art ACR of 0.86 and 1.11, respectively. We evaluate the proposed methods on the CIFAR10 (Krizhevsky et al., 2009) and Image Net (Russakovsky et al., 2015) datasets with respect to two key metrics: (i) the certiﬁed accuracy at predetermined radii and (ii) the average certiﬁed radius (ACR).
Researcher Affiliation	Academia	Miklós Z. Horváth, Mark Niklas Müller, Marc Fischer, Martin Vechev Department of Computer Science ETH Zurich, Switzerland mihorvat@ethz.ch, {mark.mueller,marc.fischer,martin.vechev}@inf.ethz.ch
Pseudocode	Yes	Algorithm 1 Certify from (Cohen et al., 2019) and Algorithm 2 Adaptive Sampling
Open Source Code	Yes	We release all code and models required to reproduce our results at https://github.com/eth-sri/smoothing-ensembles.
Open Datasets	Yes	We evaluate the proposed methods on the CIFAR10 (Krizhevsky et al., 2009) and Image Net (Russakovsky et al., 2015) datasets with respect to two key metrics: (i) the certiﬁed accuracy at predetermined radii and (ii) the average certiﬁed radius (ACR).
Dataset Splits	Yes	We evaluate every 20th image of the CIFAR10 test set and every 100th of the Image Net test set (500 samples total). Image Net (Russakovsky et al., 2015) contains 1 287 167 training and 50 000 validation images, partitioned into 1000 classes. To rank the single models for CIFAR10, we have evaluated them on a disjunct hold-out portion of the CIFAR10 test set. Concretely, we use the test images with indices 1, 21, 41, ..., 9981 to rank the single models for GAUSSIAN, CONSISTENCY, and SMOOTHADV trained models.
Hardware Specification	Yes	We implement our approach in Py Torch (Paszke et al., 2019) and evaluate on CIFAR10 with ensembles of Res Net20 and Res Net110 and on Image Net with ensembles of Res Net50 (He et al., 2016), using 1 and 2 Ge Force RTX 2080 Ti, respectively. CIFAR10 models were trained on single Ge Force RTX 2080 Ti and Image Net models on quadruple 2080 Tis.
Software Dependencies	No	The paper mentions Py Torch but does not provide a specific version number. Other software like CUDA are not mentioned with specific versions for key dependencies required for reproducibility. For example: 'We implement our approach in Py Torch (Paszke et al., 2019)...'
Experiment Setup	Yes	We use the same training schedule and optimizer for all models, i.e., stochastic gradient descent with Nesterov momentum (weight = 0.9, no dampening), with an ℓ2 weight decay of 0.0001. For CIFAR10, we use a batch size of 256 and an initial learning rate of 0.1, reducing it by a factor of 10 every 50 epochs and training for a total of 150 epochs. For Image Net, we use the same settings, only reducing the total epoch number to 90 and decreasing the learning rate every 30 epochs.