Defending against Adversarial Audio via Diffusion Model
Authors: Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on speech command recognition task to evaluate the robustness of Audio Pure. Our method is effective against diverse adversarial attacks (e.g. L2 or L -norm). It outperforms the existing methods under both strong adaptive white-box and black-box attacks bounded by L2 or L norm (up to +20% in robust accuracy). Besides, we also evaluate the certified robustness for perturbations bounded by L2-norm via randomized smoothing. |
| Researcher Affiliation | Collaboration | 1Arizona State University 2Shanghai Jiao Tong University 3NVIDIA |
| Pseudocode | No | The paper includes diagrams but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/cychomatica/Audio Pure. |
| Open Datasets | Yes | We use the Speech Commands dataset (Warden, 2018), which consists of 85,511 training utterances, 10,102 validation utterances, and 4,890 tests utterances. |
| Dataset Splits | Yes | We use the Speech Commands dataset (Warden, 2018), which consists of 85,511 training utterances, 10,102 validation utterances, and 4,890 tests utterances. |
| Hardware Specification | Yes | We evaluate it on an NVIDIA RTX 3090 GPU with Intel Core i9-10920X CPU @ 3.50GHz and 64 GB RAM. |
| Software Dependencies | No | The paper mentions using Python, PyTorch, and specific models like Improved DDPM and Diff Wave, but does not provide specific version numbers for these software dependencies (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For the UNet model, we set image size = 32, num channels = 3, and num res blocks = 128. For diffusion flags, we set N = 200, β1 = 0.0001, βN = 0.02 and use the linear variance schedule. For the model training, we set the learning rate to 1e 4 and the batch size to 230. |