Building Robust Ensembles via Margin Boosting

Authors: Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, Arun Sai Suggala

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive empirical evaluation on benchmark datasets, we show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion. An important byproduct of our work is a margin-maximizing cross-entropy (MCE) loss, which is a better alternative to the standard cross-entropy (CE) loss. Empirically, we show that replacing the CE loss in state-of-the-art adversarial training techniques with our MCE loss leads to significant performance improvement.
Researcher Affiliation Collaboration 1Mila and Universit e de Montr eal 2University of Waterloo 3 Carnegie Mellon University 4Google Research. Correspondence to: Dinghuai Zhang <dinghuai.zhang@mila.quebec>.
Pseudocode Yes Algorithm 1 MRBOOST ... Algorithm 2 MRBOOST.NN ... Algorithm 3 SAMPLER.EXP (Exponential) ... Algorithm 4 SAMPLER.ALL
Open Source Code Yes Our code is available at https://github.com/zdh Narsil/margin-boosting.
Open Datasets Yes We consider three datasets: SVHN, CIFAR-10 and CIFAR100.
Dataset Splits No The paper provides details on training parameters like learning rate, epochs, and batch size, but does not explicitly state dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification Yes These experiments were run on an NVIDIA A100 GPU.
Software Dependencies No The paper provides 'Py Torch-style pseudocode' but does not specify exact version numbers for PyTorch or other software dependencies.
Experiment Setup Yes We train Res Net-18 using SGD with 0.9 momentum for 100 epochs. The initial leaning rate is set to 0.1 and it is further decayed by the factor of 10 at the 50-th and 75-th epoch. The batch size is set to 128 in this work. We also use a weight decay of 5 10 4. For the ℓ threat model, we use ϵ = 8/255. The step size of attacks is 1/255 for SVHN and 2/255 for CIFAR-10 and CIFAR-100. PGD-10 (Madry et al., 2017) attack is used for adversarial training and PGD20 is used during testing period.