Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
Authors: Gal Greshler, Tamar Shaham, Tomer Michaeli
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our catch-a-waveform (CAW) method in several applications and evaluate it both qualitatively and quantitatively. Our training examples contain a variety of audio types, including polyphonic rock and pop music, monophonic instrumental music, speech, and ambient sounds. |
| Researcher Affiliation | Academia | Gal Greshler Technion Israel Institute of Technology galgreshler@gmail.com Tamar Rott Shaham Technion Israel Institute of Technology stamarot@campus.technion.ac.il Tomer Michaeli Technion Israel Institute of Technology tomer.m@ee.technion.ac.il |
| Pseudocode | No | The paper describes the model in detail and includes figures, but no formal pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | 1code is available at https://github.com/galgreshler/Catch-A-Waveform |
| Open Datasets | Yes | We compare our BE results to the the state-of-the-art temporal Fi LM (TFi LM) method [6], which requires a large training set to perform this task. We use the VCTK dataset, and report both the signal to noise ratio (SNRs) and the log spectral distance (LSD) [17] between the recovered signal and the ground-truth one, averaged over a test set. |
| Dataset Splits | No | The paper mentions training on short signals (e.g., 20-25 seconds) and testing on held-out data or specific test sets (e.g., 'held-out sentences of the same speaker' for VCTK), but does not provide specific percentages, sample counts, or clear references to predefined train/validation/test splits for their experiments. |
| Hardware Specification | Yes | Training on a 25 second long signal takes about 10 hours on Nvidia Ge Force RTX 2080. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | For training, we use the Adam optimizer [28] with (β1, β2) = (0.5, 0.999) and learning rate 0.0015, which we reduce by a factor of 10 after two thirds of the epochs (we run a total of 3000 epochs). |