reproducibilityindex.ai

Excess Capacity and Backdoor Poisoning

Authors: Naren Manoj, Avrim Blum

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To gain a better foundational understanding of backdoor data poisoning attacks, we present a formal theoretical framework within which one can discuss backdoor data poisoning attacks for classiﬁcation problems. We then use this to analyze important statistical and computational issues surrounding these attacks. On the statistical front, we identify a parameter we call the memorization capacity that captures the intrinsic vulnerability of a learning problem to a backdoor attack. This allows us to argue about the robustness of several natural learning problems to backdoor attacks. Our results favoring the attacker involve presenting explicit constructions of backdoor attacks, and our robustness results show that some natural problem settings cannot yield successful backdoor attacks. From a computational standpoint, we show that under certain assumptions, adversarial training can detect the presence of backdoors in a training set. We then show that under similar assumptions, two closely related problems we call backdoor ﬁltering and robust generalization are nearly equivalent. This implies that it is both asymptotically necessary and sufﬁcient to design algorithms that can identify watermarked examples in the training set in order to obtain a learning algorithm that both generalizes well to unseen data and is robust to backdoors. Numerical Trials To exemplify such a workﬂow, we implement adversarial training in a backdoor data poisoning setting. Speciﬁcally, we select a target label, inject a varying fraction of poisoned examples into the MNIST dataset (see [2]), and estimate the robust training and test loss for each choice of α.
Researcher Affiliation	Academia	Naren Sarayu Manoj Toyota Technological Institute Chicago Chicago, IL 60637 nsm@ttic.edu Avrim Blum Toyota Technological Institute Chicago Chicago, IL 60637 avrim@ttic.edu
Pseudocode	Yes	See Algorithm A.1 in the Appendix for the pseudocode of an algorithm witnessing the statement of Theorem 14.
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for its methodology.
Open Datasets	Yes	Speciﬁcally, we select a target label, inject a varying fraction of poisoned examples into the MNIST dataset (see [2]), and estimate the robust training and test loss for each choice of α.
Dataset Splits	No	The paper mentions 'training robust loss' and 'test-time robust loss' but does not specify details about a validation split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions implementing adversarial training but does not list specific software dependencies with version numbers.
Experiment Setup	Yes	Numerical Trials To exemplify such a workﬂow, we implement adversarial training in a backdoor data poisoning setting. Speciﬁcally, we select a target label, inject a varying fraction of poisoned examples into the MNIST dataset (see [2]), and estimate the robust training and test loss for each choice of α. For a more detailed description of our methodology, setup, and results, please see Appendix Section B.