reproducibilityindex.ai

Defending against Adversarial Audio via Diffusion Model

Authors: Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on speech command recognition task to evaluate the robustness of Audio Pure. Our method is effective against diverse adversarial attacks (e.g. L2 or L -norm). It outperforms the existing methods under both strong adaptive white-box and black-box attacks bounded by L2 or L norm (up to +20% in robust accuracy). Besides, we also evaluate the certified robustness for perturbations bounded by L2-norm via randomized smoothing.
Researcher Affiliation	Collaboration	1Arizona State University 2Shanghai Jiao Tong University 3NVIDIA
Pseudocode	No	The paper includes diagrams but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/cychomatica/Audio Pure.
Open Datasets	Yes	We use the Speech Commands dataset (Warden, 2018), which consists of 85,511 training utterances, 10,102 validation utterances, and 4,890 tests utterances.
Dataset Splits	Yes	We use the Speech Commands dataset (Warden, 2018), which consists of 85,511 training utterances, 10,102 validation utterances, and 4,890 tests utterances.
Hardware Specification	Yes	We evaluate it on an NVIDIA RTX 3090 GPU with Intel Core i9-10920X CPU @ 3.50GHz and 64 GB RAM.
Software Dependencies	No	The paper mentions using Python, PyTorch, and specific models like Improved DDPM and Diff Wave, but does not provide specific version numbers for these software dependencies (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For the UNet model, we set image size = 32, num channels = 3, and num res blocks = 128. For diffusion flags, we set N = 200, β1 = 0.0001, βN = 0.02 and use the linear variance schedule. For the model training, we set the learning rate to 1e 4 and the batch size to 230.