DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles
Authors: Huanrui Yang, Jingyang Zhang, Hongliang Dong, Nathan Inkawhich, Andrew Gardner, Andrew Touchet, Wesley Wilkes, Heath Berry, Hai Li
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare DVERGE with various counterparts, including Baseline which trains an ensemble in a standard way and two previous robust ensemble training methods: ADP [12] and GAL [13]. For a fair comparison, we use ResNet-20 [27] as sub-models and average the output probabilities after the soft-max layer of each sub-model to yield the final predictions of ensembles. All the evaluations are performed on the CIFAR-10 dataset [28]. |
| Researcher Affiliation | Collaboration | Huanrui Yang1 , Jingyang Zhang1 , Hongliang Dong1 , Nathan Inkawhich1, Andrew Gardner2, Andrew Touchet2, Wesley Wilkes2, Heath Berry2, Hai Li1 1Department of Electrical and Computer Engineering, Duke University 2Radiance Technologies 1{huanrui.yang, jz288, hongliang.dong, nai2, hai.li}@duke.edu, 2{andrew.gardner, atouchet, Wesley.Wilkes, Heath.Berry}@radiancetech.com |
| Pseudocode | Yes | Algorithm 1 shows the pseudo-code for training an ensemble of N sub-models. |
| Open Source Code | Yes | The code of this work is available at https://github.com/zjysteven/DVERGE. |
| Open Datasets | Yes | All the evaluations are performed on the CIFAR-10 dataset [28]. |
| Dataset Splits | No | The paper mentions training and testing on CIFAR-10 but does not explicitly provide details about a validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | We implement DVERGE with PyTorch [38] on NVIDIA GPUs with Adam optimizer [37]. |
| Experiment Setup | Yes | Training configuration details can be found in Appendix A. For DVERGE, we use PGD with momentum [29] to perform the feature distillation in Equation (1). We conduct 10 steps of gradient descent during feature distillation with a step size of ϵ/10. The ϵ used for each ensemble size to achieve the results in this section was empirically chosen for the highest diversity and lowest transferability, such that ϵ = 0.07, 0.05, 0.05 for ensembles with 3, 5, and 8 sub-models, respectively. [...] We use a batch size of 128. The learning rate is initialized as 0.001 and decays to 0.0001 after 200 epochs and further decays to 0.00001 after 300 epochs. The total training epoch is 350. For the PGD attack in feature distillation, we apply 10 steps with step size ϵ/10 and 5 random starts. The initial pretraining is done with 50 epochs. |