Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Authors: Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Mohammad Norouzi, Douglas Eck, Karen Simonyan
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using NSynth, we demonstrate improved qualitative and quantitative performance of the Wave Net autoencoder over a well-tuned spectral autoencoder baseline. |
| Researcher Affiliation | Industry | 1Google Brain 2Deep Mind. Correspondence to: Jesse Engel <jesseengel@google.com>. |
| Pseudocode | No | The paper describes the models and architectures textually and through diagrams (Figure 1), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that the NSynth dataset is publicly available, but it does not provide a link or statement about the open-source code for the Wave Net autoencoder model itself. |
| Open Datasets | Yes | The full NSynth dataset will be made publicly available in a serialized format after publication. is available for download at http://download.magenta.tensorflow.org/hans as TFRecord files split into training and holdout sets. |
| Dataset Splits | No | The paper mentions that the dataset is split into "training and holdout sets" but does not specify a separate validation split or its proportions. |
| Hardware Specification | No | The paper describes training parameters but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using an "Adam optimizer" and "Tensor Flow Example protocol buffer", but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The baseline models commonly use a learning rate of 1e-4, while the Wave Net models use a schedule, starting at 2e-4 and descending to 6e-5, 2e-5, and 6e-6 at iterations 120k, 180k, and 240k respectively. The baseline models train asynchronously for 1800k iterations with a batch size of 8. The Wave Net models train synchronously for 250k iterations with a batch size of 32. |