FloWaveNet : A Generative Flow for Raw Audio
Authors: Sungwon Kim, Sang-Gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose Flo Wave Net, a flow-based generative model for raw audio synthesis. Flo Wave Net requires only a single-stage training procedure and a single maximum likelihood loss, without any additional auxiliary terms, and it is inherently parallel due to the characteristics of generative flow. The model can efficiently sample raw audio in real-time, with clarity comparable to previous two-stage parallel models. The code and samples for all models, including our Flo Wave Net, are publicly available. |
| Researcher Affiliation | Collaboration | Sungwon Kim 1 Sang-gil Lee 1 Jongyoon Song 1 Jaehyeon Kim 2 Sungroh Yoon 1 3 1Electrical and Computer Engineering, Seoul National University, Seoul, Korea 2Kakao Corporation 3ASRI, INMC, Institute of Engineering Research, Seoul National University, Seoul, Korea. Correspondence to: Sungroh Yoon <sryoon@snu.ac.kr>. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and samples for all models, including our Flo Wave Net, are publicly available. [...] 1https://github.com/ksw0306/Flo Wave Net 2https://github.com/ksw0306/Clari Net 3https://ksw0306.github.io/flowavenet-demo |
| Open Datasets | Yes | We trained the model using the LJSpeech dataset (Ito, 2017), which is a 24-hour waveform audio set of a single female speaker with 13,100 audio clips and a sample rate of 22k Hz. |
| Dataset Splits | No | The paper mentions using a "test set" but does not specify the train/validation/test splits or their percentages/counts for the datasets used in the experiments. |
| Hardware Specification | Yes | We used NVIDIA Tesla V100 GPUs with a batch size of 8 for all models. [...] We also reported the number of training iterations per second for Gaussian Wave Net, Gaussian IAF, and Flo Wave Net with a single NVIDIA Tesla V100 GPU in Table 2. |
| Software Dependencies | No | The paper mentions the use of an "Adam optimizer" but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation. |
| Experiment Setup | Yes | We randomly extracted 16,000 sample chunks and normalized them to [ 1, 1] as the input. [...] We used an Adam optimizer (Kingma & Ba, 2014) with a learning rate of 10 3 for all models, identically to the Clari Net training configuration. We scheduled the learning rate decay by a factor of 0.5 for every 200K iterations. We used NVIDIA Tesla V100 GPUs with a batch size of 8 for all models. [...] Flo Wave Net has 8 context blocks. Each block contains 6 flows, which results in a total of 48 stacks of flows. We used the affine coupling layer with a 2-layer non-causal Wave Net architecture (Van Den Oord et al., 2016) and a kernel size of 3 for each flow. [...] We used 256 channels for a residual, skip, and gate channel with a gated tanh activation unit for all of the Wave Net architecture, along with the mel spectrogram condition. |