Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Authors: Aaron Oord, Yazhe Li, Igor Babuschkin, Karen Simonyan, Oriol Vinyals, Koray Kavukcuoglu, George Driessche, Edward Lockhart, Luis Cobo, Florian Stimberg, Norman Casagrande, Dominik Grewe, Seb Noury, Sander Dieleman, Erich Elsen, Nal Kalchbrenner, Heiga Zen, Alex Graves, Helen King, Tom Walters, Dan Belov, Demis Hassabis
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 5 then presents experimental results showing no loss in perceived quality for parallel versus original Wave Net, and continued superiority over previous benchmarks. We also present timings for sample generation, demonstrating more than 1000 speedup relative to original Wave Net. |
| Researcher Affiliation | Industry | Aaron van den Oord 1 Yazhe Li 1 Igor Babuschkin 1 Karen Simonyan 1 Oriol Vinyals 1 Koray Kavukcuoglu 1 George van den Driessche 1 Edward Lockhart 1 Luis C. Cobo 1 Florian Stimberg 1 Norman Casagrande 1 Dominik Grewe 1 Seb Noury 1 Sander Dieleman 1 Erich Elsen 1 Nal Kalchbrenner 1 Heiga Zen 1 Alex Graves 1 Helen King 1 Tom Walters 1 Dan Belov 1 Demis Hassabis 1DeepMind Technologies, London, United Kingdom. |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | In our first set of experiments, we looked at the quality of Wave Net distillation compared to the autoregressive Wave Net teacher and other baselines on data from a professional female speaker (van den Oord et al., 2016a). |
| Dataset Splits | No | The paper does not specify explicit training/validation/test dataset splits by percentage or sample count. It mentions minibatch size and total training steps but not dataset partitioning. |
| Hardware Specification | Yes | We have benchmarked the sampling speed of autoregressive and distilled Wave Nets on an NVIDIA P100 GPU. |
| Software Dependencies | No | The paper mentions 'TensorFlow' and 'XLA' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The teacher Wave Net network was trained for 1,000,000 steps with the ADAM optimiser (Kingma & Ba, 2014) with a minibatch size of 32 audio clips, each containing 7,680 timesteps (roughly 320ms). The learning rate was held constant at 2 · 10−4, and Polyak averaging (Polyak & Juditsky, 1992) was applied over the parameters. The model consists of 30 layers, grouped into 3 dilated residual block stacks of 10 layers... The student network consisted of the same Wave Net architecture layout... The student was also trained for 1,000,000 steps with the same optimisation settings. The student typically consisted of 4 flows with 10, 10, 10, 30 layers respectively, with 64 hidden units for the residual and gating layers. |