reproducibilityindex.ai

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Authors: Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Defossez

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed approach considering both objective metrics and human studies. As we demonstrate empirically, such an approach can be applied to a wide variety of tasks and audio domains to replace the traditional GAN based decoders.
Researcher Affiliation	Collaboration	Robin San Roman}, Yossi Adi},\| Antoine Deleforge Romain Serizel Gabriel Synnaeve} Alexandre Defossez} }: FAIR Team, Meta : Universite de Lorraine, CNRS, Inria, LORIA, Nancy, France \|: The Hebrew University of Jerusalem
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code	Yes	Training and evaluation code are available on the facebookresearch/audiocraft github project.
Open Datasets	Yes	We use speech from the train set of Common Voice 7.0 (9096 hours) [Ardila et al., 2019] together with the DNS challenge 4 (2425 hours) [Dubey et al., 2022]. For music, we use the MTG-Jamendo dataset (919h) [Bogdanov et al., 2019]. For the environmental sound we use FSD50K (108 hours) [Fonseca et al., 2021] and Audio Set (4989 hours) [Gemmeke et al., 2017].
Dataset Splits	No	The paper does not explicitly state the training, validation, and test dataset splits with percentages, absolute counts, or references to predefined splits for reproducibility. It mentions using 'the train set of Common Voice' and sampling from a 'test set', but no details on how data was partitioned for validation.
Hardware Specification	Yes	It takes around 2 days on 4 Nvidia V100 with 16 GB to train one of the 4 models.
Software Dependencies	No	The paper mentions software like 'Vi SQOL [Chinen et al., 2020] metric' and 'julius 1' with a GitHub link, but it does not specify version numbers for these or other key software components (e.g., programming language, deep learning frameworks), which is required for reproducible dependency descriptions.
Experiment Setup	Yes	We trained our diffusion models using our proposed power schedule with power p = 7.5, β0 = 1.0e 5 and βT = 2.9e 2. ... We train our models using Adam optimizer with batch size 128 and a learning rate of 1e-4.