reproducibilityindex.ai

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

Authors: Yuancheng Wang, Zeqian Ju, Xu Tan, Lei He, Zhizheng Wu, Jiang Bian, sheng zhao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution).
Researcher Affiliation	Collaboration	Yuancheng Wang12 , Zeqian Ju1, Xu Tan1, Lei He1, Zhizheng Wu2, Jiang Bian1, Sheng Zhao1 1Microsoft, 2The Chinese University of Hong Kong, Shenzhen
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	Demo samples are available at https://audit-demopage.github.io/. This is a demo page and not a direct link to the source code for the methodology or an explicit statement of code release.
Open Datasets	Yes	The datasets used in our work consist of Audio Caps [22], Audio Set [12], FSD50K [11], and ESC50 [41].
Dataset Splits	No	The paper states "We use a total of about 0.6M triplet data to train our audio editing model." but does not provide specific training/validation/test split information or explicitly mention a validation set for reproduction.
Hardware Specification	Yes	Our models are trained on 8 NVIDIA V100 GPUs for 500K steps with a batch size of 2 on each device.
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software components or libraries (e.g., PyTorch, Python, CUDA versions).
Experiment Setup	Yes	We train our autoencoder model with a batch size of 32 (8 per device) on 8 NVIDIA V100 GPUs for a total of 50000 steps with a learning rate of 7.5e 5. For both audio editing and U-Net audio generative diffusion, we train with a batch size of 8 on 8 NVIDIA V100 GPUs for a total of 500000 steps with a learning rate of 5e 5. Both the autoencoder and diffusion models use Adam W[29] as the optimizer with (β1, β2) = (0.9, 0.999) and weight decay of 1e 2.