TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer

Authors: Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger B. Grosse

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted two sets of experiments to 1) experiment with pitch-shifting and tempo-changing to further validate our choice of CQT representation; 2) test our full Timbre Tron pipeline (along with ablation experiments to validate our architectural choices).
Researcher Affiliation Academia University of Toronto1, Vector Institute2, Dalhousie University3
Pseudocode No The paper describes procedural steps for components like beam search, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes Code available at: https://github.com/huangsicong/Timbre Tron
Open Datasets Yes Our Real World Dataset comprises of data collected from You Tube videos of people performing solo on different instruments... Here is a complete list of You Tube links from which we collected our Real World Dataset. Our MIDI dataset consists of two parts: MIDI-BACH 11 and MIDI-Chopin 12.
Dataset Splits No We've also randomly taken out some segments for the validation set. This indicates validation sets exist but no specific split information (percentages, counts) is provided.
Hardware Specification No The paper mentions 'GPU memory constraint' but provides no specific hardware details such as GPU/CPU models, memory, or specific cloud resources used for experiments.
Software Dependencies No The paper mentions software components like 'Adam optimizer' and 'librosa', but does not specify their version numbers or versions for other key libraries or dependencies.
Experiment Setup Yes The weighting for our cycle consistency loss is 10 and the weighting of the identity loss is 5. Our learning rate is exponentially warmed up to 0.0001 over 2500 steps, stays constant, then at step 100000 starts to linearly decay to zero. The total training step is 1.5 million steps, trained with Adam optimizer (Kingma and Ba, 2014) with β1 = 0 and β2 = 0.9, with a batch size of 1. For the conditional wavenet , we used kernel size of 3 for all the dilated convolution layers and the initial causal convolution. We maintain a constant number of candidates (beam width = 8) by replicating the remaining candidate waveforms after each pruning process.