reproducibilityindex.ai

GANSynth: Adversarial Neural Audio Synthesis

Authors: Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, Adam Roberts

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive empirical investigations on the NSynth dataset, we demonstrate that GANs are able to outperform strong Wave Net baselines on automated and human evaluation metrics, and efficiently generate audio several orders of magnitude faster than their autoregressive counterparts.
Researcher Affiliation	Industry	Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, & Adam Roberts Google AI Mountain View, CA 9043, USA
Pseudocode	No	The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	Online resources: Colab Notebook: http://goo.gl/magenta/gansynth-demo, Audio Examples: http://goo.gl/magenta/gansynth-examples, Code: http://goo.gl/magenta/gansynth-code
Open Datasets	Yes	We focus our study on the NSynth dataset, which contains 300,000 musical notes from 1,000 different instruments aligned and recorded in isolation. NSynth is a difﬁcult dataset composed of highly diverse timbres and pitches, but it is also highly structured with labels for pitch, velocity, instrument, and acoustic qualities (Liu et al., 2015; Engel et al., 2017). Each sample is four seconds long, and sampled at 16k Hz, giving 64,000 dimensions. As we wanted to included human evaluations on audio quality, we restricted ourselves to training on the subset of acoustic instruments and fundamental pitches ranging from MIDI 24-84 ( 32-1000Hz), as those timbres are most likely to sound natural to an average listener. This left us with 70,379 examples from instruments that are mostly strings, brass, woodwinds, and mallets. We created a new test/train 80/20 split from shufﬂed data, as the original split was divided along instrument type, which isn t desirable for this task. 2 https://magenta.tensorflow.org/datasets/nsynth
Dataset Splits	Yes	We created a new test/train 80/20 split from shufﬂed data, as the original split was divided along instrument type, which isn t desirable for this task.
Hardware Specification	Yes	We train each GAN variant for 4.5 days on a single V100 GPU, with a batch size of 8.
Software Dependencies	No	The paper mentions 'Tensorﬂow', 'ADAM optimizer (Kingma & Ba, 2014)', 'mulaw encoding', and 'mixture of 10 logistics (Salimans et al., 2017)', but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	All models were trained with the ADAM optimizer (Kingma & Ba, 2014). We sweep over learning rates (2e-4, 4e-4, 8e-4) and weights of the auxiliary classiﬁer loss (0.1, 1.0, 10), and ﬁnd that for all variants (spectral representation, progressive/no progressive, frequency resolution) a learning rate of 8e-4 and classiﬁer loss of 10 perform the best. We train each GAN variant for 4.5 days on a single V100 GPU, with a batch size of 8. For nonprogressive models, this equates to training on 5M examples. For progressive models, we train on 1.6M examples per a stage (7 stages), 800k during alpha blending and 800k after blending.