TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer
Authors: Sicong Huang, Qiyang Li, Cem Anil, Xuchan Bao, Sageev Oore, Roger B. Grosse
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted two sets of experiments to 1) experiment with pitch-shifting and tempo-changing to further validate our choice of CQT representation; 2) test our full Timbre Tron pipeline (along with ablation experiments to validate our architectural choices). |
| Researcher Affiliation | Academia | University of Toronto1, Vector Institute2, Dalhousie University3 |
| Pseudocode | No | The paper describes procedural steps for components like beam search, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Code available at: https://github.com/huangsicong/Timbre Tron |
| Open Datasets | Yes | Our Real World Dataset comprises of data collected from You Tube videos of people performing solo on different instruments... Here is a complete list of You Tube links from which we collected our Real World Dataset. Our MIDI dataset consists of two parts: MIDI-BACH 11 and MIDI-Chopin 12. |
| Dataset Splits | No | We've also randomly taken out some segments for the validation set. This indicates validation sets exist but no specific split information (percentages, counts) is provided. |
| Hardware Specification | No | The paper mentions 'GPU memory constraint' but provides no specific hardware details such as GPU/CPU models, memory, or specific cloud resources used for experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'librosa', but does not specify their version numbers or versions for other key libraries or dependencies. |
| Experiment Setup | Yes | The weighting for our cycle consistency loss is 10 and the weighting of the identity loss is 5. Our learning rate is exponentially warmed up to 0.0001 over 2500 steps, stays constant, then at step 100000 starts to linearly decay to zero. The total training step is 1.5 million steps, trained with Adam optimizer (Kingma and Ba, 2014) with β1 = 0 and β2 = 0.9, with a batch size of 1. For the conditional wavenet , we used kernel size of 3 for all the dilated convolution layers and the initial causal convolution. We maintain a constant number of candidates (beam width = 8) by replicating the remaining candidate waveforms after each pruning process. |