reproducibilityindex.ai

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

Authors: Gal Greshler, Tamar Shaham, Tomer Michaeli

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our catch-a-waveform (CAW) method in several applications and evaluate it both qualitatively and quantitatively. Our training examples contain a variety of audio types, including polyphonic rock and pop music, monophonic instrumental music, speech, and ambient sounds.
Researcher Affiliation	Academia	Gal Greshler Technion Israel Institute of Technology galgreshler@gmail.com Tamar Rott Shaham Technion Israel Institute of Technology stamarot@campus.technion.ac.il Tomer Michaeli Technion Israel Institute of Technology tomer.m@ee.technion.ac.il
Pseudocode	No	The paper describes the model in detail and includes figures, but no formal pseudocode or algorithm blocks are provided.
Open Source Code	Yes	1code is available at https://github.com/galgreshler/Catch-A-Waveform
Open Datasets	Yes	We compare our BE results to the the state-of-the-art temporal Fi LM (TFi LM) method [6], which requires a large training set to perform this task. We use the VCTK dataset, and report both the signal to noise ratio (SNRs) and the log spectral distance (LSD) [17] between the recovered signal and the ground-truth one, averaged over a test set.
Dataset Splits	No	The paper mentions training on short signals (e.g., 20-25 seconds) and testing on held-out data or specific test sets (e.g., 'held-out sentences of the same speaker' for VCTK), but does not provide specific percentages, sample counts, or clear references to predefined train/validation/test splits for their experiments.
Hardware Specification	Yes	Training on a 25 second long signal takes about 10 hours on Nvidia Ge Force RTX 2080.
Software Dependencies	No	The paper mentions using the Adam optimizer, but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	For training, we use the Adam optimizer [28] with (β1, β2) = (0.5, 0.999) and learning rate 0.0015, which we reduce by a factor of 10 after two thirds of the epochs (we run a total of 3000 epochs).