Deep Audio Priors Emerge From Harmonic Convolutional Networks

Authors: Zhoutong Zhang, Yunyun Wang, Chuang Gan, Jiajun Wu, Joshua B. Tenenbaum, Antonio Torralba, William T. Freeman

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we empirically show that current network architectures for audio processing do not show strong evidence in capturing such priors. We propose Harmonic Convolution, an operation that helps deep networks model priors in audio signals by explicitly utilizing the harmonic structure. This is done by engineering the kernels to be supported by sets of harmonic series, instead of by local neighborhoods as convolutional kernels. We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance on unsupervised audio restoration. With Harmonic Convolution, they also achieve better generalization performance for supervised musical source separation. Code and examples are available at our project page: http://dap.csail.mit.edu.
Researcher Affiliation Collaboration 1Massachusetts Institute of Technology 2IIIS, Tsinghua University 3MIT-IBM Watson Lab 4Stanford University 5Google Research
Pseudocode No The paper provides mathematical definitions and implementation details but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code and examples are available at our project page: http://dap.csail.mit.edu.
Open Datasets Yes We use the LJ-Speech (Ito, 2017) dataset and the MUSIC (Zhao et al., 2018) dataset.
Dataset Splits Yes We use a 90:10 train-val split, and test the performance on the mixture between sounds of the target instrument and the sounds of the holdout instrument.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies No The paper mentions software components like "Adam optimizer" and libraries used for normalization and activations, but it does not specify their version numbers for reproducibility.
Experiment Setup Yes We train all the networks using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.001 for all the experiments. The random input is drawn from a standard Gaussian distribution, and the weights are initialized by drawing from a zero-mean Gaussian distribution with a standard deviation of 0.02.