Adversarial Generation of Time-Frequency Features with application in audio synthesis

Authors: Andrés Marafioti, Nathanaël Perraudin, Nicki Holighaus, Piotr Majdak

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the potential of deliberate generative TF modeling by training a generative adversarial network (GAN) on short-time Fourier features. We show that by applying our guidelines, our TF-based network was able to outperform a state-of-the-art GAN generating waveforms directly, despite the similar architecture in the two networks. ... To evaluate the performance of Ti FGAN, we trained Ti FGAN-M and Ti FGAN-MTF using the procedure outlined above on two datasets from (Donahue et al., 2019): (a) Speech, a subset of spoken digits zero through nine (sc09) from the Speech Commands Dataset (Warden, 2018). (b) Music, a dataset of 25 minutes of piano recordings of Bach compositions, segmented into approximately 19,000 overlapping samples of 1 s duration.
Researcher Affiliation Academia 1Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12 14, 1040 Vienna, Austria. 2Swiss Data Science Center, ETH Z urich, Universit atstrasse 25, 8006 Z urich.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It includes diagrams of the network architecture, but no text-based algorithmic steps.
Open Source Code Yes Our software, complemented by instructive examples, is available at http://tifgan.github.io.
Open Datasets Yes To evaluate the performance of Ti FGAN, we trained Ti FGAN-M and Ti FGAN-MTF using the procedure outlined above on two datasets from (Donahue et al., 2019): (a) Speech, a subset of spoken digits zero through nine (sc09) from the Speech Commands Dataset (Warden, 2018).
Dataset Splits No The paper mentions using two datasets but does not provide specific details on how these datasets were split into training, validation, or test sets (e.g., percentages or exact counts), nor does it reference predefined splits with citations for reproducibility.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.
Software Dependencies No The paper mentions software components like 'SciPy and Tensorflow', 'Large Time-Frequency Analysis Toolbox (LTFAT, Pr uˇsa et al., 2014)', and 'ADAM optimizer (Kingma & Ba, 2015)', but it does not specify version numbers for these libraries or frameworks, which are crucial for reproducibility.
Experiment Setup Yes For the short-time Fourier transform, we fix the minimal redundancy that we consider reliable, i.e., M/a = 4 and select a = 128, M = 512, such that MR = 257, N = L/a = 128 and the STFT matrix S is of size CMR N. ... The dynamic range of the log-magnitude is limited by clipping at r (in our experiments r = 10), before scaling and shifting to the range of the generator output [ 1, 1]... Our networks were trained for 200k steps... We optimized the Wasserstein loss (Gulrajani et al., 2017) with the gradient penalty hyperparameter set to 10 using the ADAM optimizer (Kingma & Ba, 2015) with α = 10 4, β1 = 0.5, β2 = 0.9 and performed 5 updates of the discriminator for every update of the generator.