PubDef: Defending Against Transfer Attacks From Public Models

Authors: Chawin Sitawarin, Jaewon Chang, David Huang, Wesson Altoyan, David Wagner

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective. The defenses are evaluated under 24 public models and 11 attack algorithms across three datasets (CIFAR10, CIFAR-100, and Image Net). Under this threat model, our defense, PUBDEF, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy. For instance, on Image Net, our defense achieves 62% accuracy under the strongest transfer attack vs only 36% of the best adversarially trained model. Its accuracy when not under attack is only 2% lower than that of an undefended model (78% vs 80%).
Researcher Affiliation Collaboration Chawin Sitawarin UC Berkeley Jaewon Chang UC Berkeley David Huang* UC Berkeley Wesson Altoyan King Abdulaziz City for Science and Technology David Wagner UC Berkeley
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available here.
Open Datasets Yes The defenses are evaluated under 24 public models and 11 attack algorithms across three datasets (CIFAR10, CIFAR-100, and Image Net).
Dataset Splits Yes The clean accuracy is simply the accuracy on the test set, with no attack.
Hardware Specification Yes All of the models are trained on Nvidia A100 GPUs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies.
Experiment Setup Yes All CIFAR-10/100 models are trained for 200 epochs with a learning rate of 0.1, weight decay of 5e-4, and a batch size of 2048. Image Net models are trained for 50 epochs with a learning rate of 0.1, weight decay of 1e-4, and a batch size of 512.