reproducibilityindex.ai

FloWaveNet : A Generative Flow for Raw Audio

Authors: Sungwon Kim, Sang-Gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose Flo Wave Net, a ﬂow-based generative model for raw audio synthesis. Flo Wave Net requires only a single-stage training procedure and a single maximum likelihood loss, without any additional auxiliary terms, and it is inherently parallel due to the characteristics of generative ﬂow. The model can efﬁciently sample raw audio in real-time, with clarity comparable to previous two-stage parallel models. The code and samples for all models, including our Flo Wave Net, are publicly available.
Researcher Affiliation	Collaboration	Sungwon Kim 1 Sang-gil Lee 1 Jongyoon Song 1 Jaehyeon Kim 2 Sungroh Yoon 1 3 1Electrical and Computer Engineering, Seoul National University, Seoul, Korea 2Kakao Corporation 3ASRI, INMC, Institute of Engineering Research, Seoul National University, Seoul, Korea. Correspondence to: Sungroh Yoon <sryoon@snu.ac.kr>.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code and samples for all models, including our Flo Wave Net, are publicly available. [...] 1https://github.com/ksw0306/Flo Wave Net 2https://github.com/ksw0306/Clari Net 3https://ksw0306.github.io/flowavenet-demo
Open Datasets	Yes	We trained the model using the LJSpeech dataset (Ito, 2017), which is a 24-hour waveform audio set of a single female speaker with 13,100 audio clips and a sample rate of 22k Hz.
Dataset Splits	No	The paper mentions using a "test set" but does not specify the train/validation/test splits or their percentages/counts for the datasets used in the experiments.
Hardware Specification	Yes	We used NVIDIA Tesla V100 GPUs with a batch size of 8 for all models. [...] We also reported the number of training iterations per second for Gaussian Wave Net, Gaussian IAF, and Flo Wave Net with a single NVIDIA Tesla V100 GPU in Table 2.
Software Dependencies	No	The paper mentions the use of an "Adam optimizer" but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation.
Experiment Setup	Yes	We randomly extracted 16,000 sample chunks and normalized them to [ 1, 1] as the input. [...] We used an Adam optimizer (Kingma & Ba, 2014) with a learning rate of 10 3 for all models, identically to the Clari Net training conﬁguration. We scheduled the learning rate decay by a factor of 0.5 for every 200K iterations. We used NVIDIA Tesla V100 GPUs with a batch size of 8 for all models. [...] Flo Wave Net has 8 context blocks. Each block contains 6 ﬂows, which results in a total of 48 stacks of ﬂows. We used the afﬁne coupling layer with a 2-layer non-causal Wave Net architecture (Van Den Oord et al., 2016) and a kernel size of 3 for each ﬂow. [...] We used 256 channels for a residual, skip, and gate channel with a gated tanh activation unit for all of the Wave Net architecture, along with the mel spectrogram condition.