Defending against Adversarial Audio via Diffusion Model

Authors: Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on speech command recognition task to evaluate the robustness of Audio Pure. Our method is effective against diverse adversarial attacks (e.g. L2 or L -norm). It outperforms the existing methods under both strong adaptive white-box and black-box attacks bounded by L2 or L norm (up to +20% in robust accuracy). Besides, we also evaluate the certified robustness for perturbations bounded by L2-norm via randomized smoothing.
Researcher Affiliation Collaboration 1Arizona State University 2Shanghai Jiao Tong University 3NVIDIA
Pseudocode No The paper includes diagrams but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/cychomatica/Audio Pure.
Open Datasets Yes We use the Speech Commands dataset (Warden, 2018), which consists of 85,511 training utterances, 10,102 validation utterances, and 4,890 tests utterances.
Dataset Splits Yes We use the Speech Commands dataset (Warden, 2018), which consists of 85,511 training utterances, 10,102 validation utterances, and 4,890 tests utterances.
Hardware Specification Yes We evaluate it on an NVIDIA RTX 3090 GPU with Intel Core i9-10920X CPU @ 3.50GHz and 64 GB RAM.
Software Dependencies No The paper mentions using Python, PyTorch, and specific models like Improved DDPM and Diff Wave, but does not provide specific version numbers for these software dependencies (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes For the UNet model, we set image size = 32, num channels = 3, and num res blocks = 128. For diffusion flags, we set N = 200, β1 = 0.0001, βN = 0.02 and use the linear variance schedule. For the model training, we set the learning rate to 1e 4 and the batch size to 230.