Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

Authors: Jinyuan Jia, Xiaoyu Cao, Neil Zhenqiang Gong7961-7969

AAAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on MNIST and CIFAR10. For instance, our method achieves a certified accuracy of 91.1% on MNIST when arbitrarily modifying, deleting, and/or inserting 100 training examples.
Researcher Affiliation Academia Jinyuan Jia, Xiaoyu Cao, Neil Zhenqiang Gong Duke University EMAIL
Pseudocode Yes Algorithm 1 CERTIFY Input: A, D, k, N, De, α. Output: Predicted label and certified poisoning size for each testing example.
Open Source Code Yes Code is available at: https://github.com/jjy1994/Bagging Certify Data Poisoning.
Open Datasets Yes We use MNIST and CIFAR10 datasets. The number of training examples in the two datasets are 60,000 and 50,000, respectively, which are the training datasets that we aim to certify.
Dataset Splits No The paper specifies training and testing sets, but does not explicitly mention a distinct validation dataset split or how it was used in the experimental setup.
Hardware Specification Yes We performed experiments on a server with 80 CPUs@2.1GHz, 8 GPUs (RTX 6,000), and 385 GB main memory.
Software Dependencies No The paper mentions software like Keras and TensorFlow but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Our method has three parameters, i.e., k, α, and N. Unless otherwise mentioned, we adopt the following default settings for them: α = 0.001, N = 1,000, k = 30 for MNIST, and k = 500 for CIFAR10.