Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On the Perils of Cascading Robust Classifiers

Authors: Ravi Mangal, Zifan Wang, Chi Zhang, Klas Leino, Corina Pasareanu, Matt Fredrikson

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical ๏ฌndings are accompanied by empirical results that further demonstrate this unsoundness. We present cascade attack (Cas A), an adversarial attack against cascading ensembles, and conduct an empirical evaluation with the cascading ensembles trained by Wong et al. (2018) for MNIST and CIFAR-10 datasets.
Researcher Affiliation Collaboration Ravi Mangal , Zifan Wang , Chi Zhang Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, PA 15213 EMAIL Klas Leino School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 EMAIL Corina P as areanu Carnegie Mellon University and NASA Ames Moffett Field, CA 94043 EMAIL Matt Fredrikson School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 EMAIL
Pseudocode Yes Algorithm 3.1: Cascade Attack (Cas A)
Open Source Code Yes Our code is available at https://github.com/Trista Chi/ensemble KW.
Open Datasets Yes For our measurements, we use the โ„“ and โ„“2 robust cascading ensembles constructed by Wong et al. (2018) for MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky, 2009) datasets.
Dataset Splits No No explicit information on training/test/validation dataset splits or specific validation set details was found.
Hardware Specification Yes All our experiments were run on a pair of NVIDIA TITAN RTX GPUs with 24 GB of RAM each, and a 4.2GHz Intel Core i7-7700K with 64 GB of RAM.
Software Dependencies No Though we use the evaluation code and pre-trained models made available by Wong et al. (2018), the hardware and Py Torch versions we use in our experiments are different.
Experiment Setup Yes We solve the optimization problems on lines 6 and 8 using projected gradient descent (PGD) (Madry et al., 2018)... In Table 3, we report hyper-parameters used to run Cas A to reach the statistics reported in Table 1. Notice that if a normalization is ยต=[0.485, 0.456, 0.406], ฯƒ =0.225 , we divide the ฯต and step size by ฯƒ during the experiment. We use SGD as the optimizer for all experiments... When learning the weights, we always set the temperature to 1e5 and learning rate to 1e-2 when learning the weights as described in Appendix D.