reproducibilityindex.ai

On Collective Robustness of Bagging Against Data Poisoning

Authors: Ruoxin Chen, Zenan Li, Jie Li, Junchi Yan, Chentao Wu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our two techniques empirically and quantitatively on four datasets: collective certiﬁcation and hash bagging. Results show: i) collective certiﬁcation can yield a much stronger robustness certiﬁcate. ii) Hash bagging effectively improves vanilla bagging on the certiﬁed robustness.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering and Mo E Key Lab of Artiﬁcial Intelligence, Shanghai Jiao Tong University, Shanghai, China. Jie Li and Junchi Yan are also with Shanghai AI Laboratory, Shanghai, China.
Pseudocode	Yes	Algorithm 1: Certify the collective robustness for our proposed hash bagging. ... Algorithm 2: Train the sub-classiﬁers.
Open Source Code	Yes	Our code is available at https: //github.com/Emiyalzn/ICML22-CRB.
Open Datasets	Yes	We evaluate hash bagging and collective certiﬁcation on two classic machine learning datasets: Bank (Moro et al., 2014), Electricity (Harries & Wales, 1999), and two image classiﬁcation datasets: FMNIST (Xiao et al., 2017), CIFAR-10 (Krizhevsky et al., 2009). ... Bank: https://archive.ics.uci.edu/ml/datasets/ Bank+Marketing. Electricity: https://datahub.io/machine-learning/ electricity. Fashion-MNIST: https://github.com/zalandoresearch/ fashion-mnist. CIFAR-10: https://www.cs.toronto.edu/ kriz/cifar. html.
Dataset Splits	Yes	The detailed experimental setups are shown in Table 2. ... Bank (35,211 Trainset, 10,000 Testset) ... Electricity (35,312 Trainset, 10,000 Testset) ... FMNIST (60,000 Trainset, 10,000 Testset) ... CIFAR-10 (50,000 Trainset, 10,000 Testset)
Hardware Specification	Yes	All the experiments are conducted on CPU (16 Intel(R) Xeon(R) Gold 5222 CPU @ 3.80GHz) and GPU (one NVIDIA RTX 2080 Ti).
Software Dependencies	Yes	We use Gurobi 9.0 (Gurobi Optimization, 2021) to solve (P1) and (P2)
Experiment Setup	Yes	For efﬁciency, we limit the time to be 2s per sample1. ... 1The solving time for (P1) is universally set to be 2\|Dtest\| = 20, 000 seconds. The solving time for (P2) is set to be 2\|Ω\| for (P2) where Ωis deﬁned in Eq. (15). ... Set the random seed for training; # Reproducible training. (from Algorithm 2)