FloWaveNet : A Generative Flow for Raw Audio

Authors: Sungwon Kim, Sang-Gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose Flo Wave Net, a flow-based generative model for raw audio synthesis. Flo Wave Net requires only a single-stage training procedure and a single maximum likelihood loss, without any additional auxiliary terms, and it is inherently parallel due to the characteristics of generative flow. The model can efficiently sample raw audio in real-time, with clarity comparable to previous two-stage parallel models. The code and samples for all models, including our Flo Wave Net, are publicly available.
Researcher Affiliation Collaboration Sungwon Kim 1 Sang-gil Lee 1 Jongyoon Song 1 Jaehyeon Kim 2 Sungroh Yoon 1 3 1Electrical and Computer Engineering, Seoul National University, Seoul, Korea 2Kakao Corporation 3ASRI, INMC, Institute of Engineering Research, Seoul National University, Seoul, Korea. Correspondence to: Sungroh Yoon <sryoon@snu.ac.kr>.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code and samples for all models, including our Flo Wave Net, are publicly available. [...] 1https://github.com/ksw0306/Flo Wave Net 2https://github.com/ksw0306/Clari Net 3https://ksw0306.github.io/flowavenet-demo
Open Datasets Yes We trained the model using the LJSpeech dataset (Ito, 2017), which is a 24-hour waveform audio set of a single female speaker with 13,100 audio clips and a sample rate of 22k Hz.
Dataset Splits No The paper mentions using a "test set" but does not specify the train/validation/test splits or their percentages/counts for the datasets used in the experiments.
Hardware Specification Yes We used NVIDIA Tesla V100 GPUs with a batch size of 8 for all models. [...] We also reported the number of training iterations per second for Gaussian Wave Net, Gaussian IAF, and Flo Wave Net with a single NVIDIA Tesla V100 GPU in Table 2.
Software Dependencies No The paper mentions the use of an "Adam optimizer" but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation.
Experiment Setup Yes We randomly extracted 16,000 sample chunks and normalized them to [ 1, 1] as the input. [...] We used an Adam optimizer (Kingma & Ba, 2014) with a learning rate of 10 3 for all models, identically to the Clari Net training configuration. We scheduled the learning rate decay by a factor of 0.5 for every 200K iterations. We used NVIDIA Tesla V100 GPUs with a batch size of 8 for all models. [...] Flo Wave Net has 8 context blocks. Each block contains 6 flows, which results in a total of 48 stacks of flows. We used the affine coupling layer with a 2-layer non-causal Wave Net architecture (Van Den Oord et al., 2016) and a kernel size of 3 for each flow. [...] We used 256 channels for a residual, skip, and gate channel with a gated tanh activation unit for all of the Wave Net architecture, along with the mel spectrogram condition.