On Collective Robustness of Bagging Against Data Poisoning

Authors: Ruoxin Chen, Zenan Li, Jie Li, Junchi Yan, Chentao Wu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our two techniques empirically and quantitatively on four datasets: collective certification and hash bagging. Results show: i) collective certification can yield a much stronger robustness certificate. ii) Hash bagging effectively improves vanilla bagging on the certified robustness.
Researcher Affiliation Academia 1Department of Computer Science and Engineering and Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China. Jie Li and Junchi Yan are also with Shanghai AI Laboratory, Shanghai, China.
Pseudocode Yes Algorithm 1: Certify the collective robustness for our proposed hash bagging. ... Algorithm 2: Train the sub-classifiers.
Open Source Code Yes Our code is available at https: //github.com/Emiyalzn/ICML22-CRB.
Open Datasets Yes We evaluate hash bagging and collective certification on two classic machine learning datasets: Bank (Moro et al., 2014), Electricity (Harries & Wales, 1999), and two image classification datasets: FMNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky et al., 2009). ... Bank: https://archive.ics.uci.edu/ml/datasets/ Bank+Marketing. Electricity: https://datahub.io/machine-learning/ electricity. Fashion-MNIST: https://github.com/zalandoresearch/ fashion-mnist. CIFAR-10: https://www.cs.toronto.edu/ kriz/cifar. html.
Dataset Splits Yes The detailed experimental setups are shown in Table 2. ... Bank (35,211 Trainset, 10,000 Testset) ... Electricity (35,312 Trainset, 10,000 Testset) ... FMNIST (60,000 Trainset, 10,000 Testset) ... CIFAR-10 (50,000 Trainset, 10,000 Testset)
Hardware Specification Yes All the experiments are conducted on CPU (16 Intel(R) Xeon(R) Gold 5222 CPU @ 3.80GHz) and GPU (one NVIDIA RTX 2080 Ti).
Software Dependencies Yes We use Gurobi 9.0 (Gurobi Optimization, 2021) to solve (P1) and (P2)
Experiment Setup Yes For efficiency, we limit the time to be 2s per sample1. ... 1The solving time for (P1) is universally set to be 2|Dtest| = 20, 000 seconds. The solving time for (P2) is set to be 2|Ω| for (P2) where Ωis defined in Eq. (15). ... Set the random seed for training; # Reproducible training. (from Algorithm 2)