On Collective Robustness of Bagging Against Data Poisoning
Authors: Ruoxin Chen, Zenan Li, Jie Li, Junchi Yan, Chentao Wu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our two techniques empirically and quantitatively on four datasets: collective certification and hash bagging. Results show: i) collective certification can yield a much stronger robustness certificate. ii) Hash bagging effectively improves vanilla bagging on the certified robustness. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering and Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China. Jie Li and Junchi Yan are also with Shanghai AI Laboratory, Shanghai, China. |
| Pseudocode | Yes | Algorithm 1: Certify the collective robustness for our proposed hash bagging. ... Algorithm 2: Train the sub-classifiers. |
| Open Source Code | Yes | Our code is available at https: //github.com/Emiyalzn/ICML22-CRB. |
| Open Datasets | Yes | We evaluate hash bagging and collective certification on two classic machine learning datasets: Bank (Moro et al., 2014), Electricity (Harries & Wales, 1999), and two image classification datasets: FMNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky et al., 2009). ... Bank: https://archive.ics.uci.edu/ml/datasets/ Bank+Marketing. Electricity: https://datahub.io/machine-learning/ electricity. Fashion-MNIST: https://github.com/zalandoresearch/ fashion-mnist. CIFAR-10: https://www.cs.toronto.edu/ kriz/cifar. html. |
| Dataset Splits | Yes | The detailed experimental setups are shown in Table 2. ... Bank (35,211 Trainset, 10,000 Testset) ... Electricity (35,312 Trainset, 10,000 Testset) ... FMNIST (60,000 Trainset, 10,000 Testset) ... CIFAR-10 (50,000 Trainset, 10,000 Testset) |
| Hardware Specification | Yes | All the experiments are conducted on CPU (16 Intel(R) Xeon(R) Gold 5222 CPU @ 3.80GHz) and GPU (one NVIDIA RTX 2080 Ti). |
| Software Dependencies | Yes | We use Gurobi 9.0 (Gurobi Optimization, 2021) to solve (P1) and (P2) |
| Experiment Setup | Yes | For efficiency, we limit the time to be 2s per sample1. ... 1The solving time for (P1) is universally set to be 2|Dtest| = 20, 000 seconds. The solving time for (P2) is set to be 2|Ω| for (P2) where Ωis defined in Eq. (15). ... Set the random seed for training; # Reproducible training. (from Algorithm 2) |