Excess Capacity and Backdoor Poisoning

Authors: Naren Manoj, Avrim Blum

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To gain a better foundational understanding of backdoor data poisoning attacks, we present a formal theoretical framework within which one can discuss backdoor data poisoning attacks for classification problems. We then use this to analyze important statistical and computational issues surrounding these attacks. On the statistical front, we identify a parameter we call the memorization capacity that captures the intrinsic vulnerability of a learning problem to a backdoor attack. This allows us to argue about the robustness of several natural learning problems to backdoor attacks. Our results favoring the attacker involve presenting explicit constructions of backdoor attacks, and our robustness results show that some natural problem settings cannot yield successful backdoor attacks. From a computational standpoint, we show that under certain assumptions, adversarial training can detect the presence of backdoors in a training set. We then show that under similar assumptions, two closely related problems we call backdoor filtering and robust generalization are nearly equivalent. This implies that it is both asymptotically necessary and sufficient to design algorithms that can identify watermarked examples in the training set in order to obtain a learning algorithm that both generalizes well to unseen data and is robust to backdoors. Numerical Trials To exemplify such a workflow, we implement adversarial training in a backdoor data poisoning setting. Specifically, we select a target label, inject a varying fraction of poisoned examples into the MNIST dataset (see [2]), and estimate the robust training and test loss for each choice of α.
Researcher Affiliation Academia Naren Sarayu Manoj Toyota Technological Institute Chicago Chicago, IL 60637 nsm@ttic.edu Avrim Blum Toyota Technological Institute Chicago Chicago, IL 60637 avrim@ttic.edu
Pseudocode Yes See Algorithm A.1 in the Appendix for the pseudocode of an algorithm witnessing the statement of Theorem 14.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for its methodology.
Open Datasets Yes Specifically, we select a target label, inject a varying fraction of poisoned examples into the MNIST dataset (see [2]), and estimate the robust training and test loss for each choice of α.
Dataset Splits No The paper mentions 'training robust loss' and 'test-time robust loss' but does not specify details about a validation split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions implementing adversarial training but does not list specific software dependencies with version numbers.
Experiment Setup Yes Numerical Trials To exemplify such a workflow, we implement adversarial training in a backdoor data poisoning setting. Specifically, we select a target label, inject a varying fraction of poisoned examples into the MNIST dataset (see [2]), and estimate the robust training and test loss for each choice of α. For a more detailed description of our methodology, setup, and results, please see Appendix Section B.