Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
BagFlip: A Certified Defense Against Data Poisoning
Authors: Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Bag Flip on image classification and malware detection datasets. Bag Flip is equal to or more effective than the state-of-the-art approaches for trigger-less attacks and more effective than the state-of-the-art approaches for backdoor attacks. |
| Researcher Affiliation | Academia | Yuhao Zhang University of Wisconsin-Madison EMAIL Aws Albarghouthi University of Wisconsin-Madison EMAIL Loris D Antoni University of Wisconsin-Madison EMAIL |
| Pseudocode | No | The paper describes its algorithms and mathematical formulations in prose and equations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The implementation of Bag Flip is publicly available2. https://github.com/Forever Zyh/defend_framework |
| Open Datasets | Yes | We conduct experiments on MNIST, CIFAR10, EMBER [2], and Contagio (http://co ntagiodump.blogspot.com). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly describe validation dataset splits, percentages, or methodology for a validation set. |
| Hardware Specification | No | The paper mentions 'a single GPU' and 'a single core' for training and preparation, but it does not specify the model or type of these hardware components (e.g., NVIDIA A100, Intel Xeon). |
| Software Dependencies | No | The paper mentions training 'neural networks' and 'random forests' but does not specify any software libraries or frameworks (e.g., PyTorch, TensorFlow, Scikit-learn) along with their version numbers. |
| Experiment Setup | Yes | We train N = 1000 models and set the confidence level as 0.999 for each configuration. ... We set k = 100, 1000, 300, 30 for MNIST, CIFAR10, EMBER, and Contagio respectively when comparing to Bagging. We tune k = 80, 280 for Bagging-0.9 on MNIST and Bagging-0.95 on EMBER, respectively. And we set k = 50 for MNIST when comparing to Label Flip. |