reproducibilityindex.ai

SynthNet: Learning to Synthesize Music End-to-End

Authors: Florin Schimbinschi, Christian Walder, Sarah M. Erfani, James Bailey

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare exact replicas of the architectures described in [Van Den Oord et al., 2016; Arik et al., 2017] with our proposed architecture Synth Net. ... For the purpose of validating our hypotheses, we chose to eliminate extra sources of error by manually upsampling the midi ﬁles. ... We compare the performance and quality of these two baselines against Synth Net initially in Table 3 over three sets of hyperparmeters (Table 2). For the best resulting models we perform MOS listening tests, shown in Table 5.
Researcher Affiliation	Collaboration	Florin Schimbinschi1 , Christian Walder2 , Sarah M. Erfani1 and James Bailey1 1The University of Melbourne 2Data61 CSIRO, Australian National University
Pseudocode	No	No explicit pseudocode or algorithm blocks were found.
Open Source Code	Yes	All are implemented in Py Torch, available at https://github.com/ﬂorinsch/synthnet .
Open Datasets	Yes	We generate the dataset using the freely available Timidity++1 software synthesizer. ... We used labeled audio recordings of real audio performances from the Music Net dataset [Thickstun et al., 2016].
Dataset Splits	Yes	After synthesizing the audio, we have approximately 12 minutes of audio for each timbre, of which 9 minutes (75%) is used for training and 3 minutes (25%) for validation.
Hardware Specification	Yes	All training is done on Tesla P100 GPUs with 16GB of memory.
Software Dependencies	No	The paper mentions 'All are implemented in Py Torch' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We use the Adam [Kingma and Ba, 2014] optimizer with a batch size of 1, a learning rate of 10 3, β1 = 0.9, β2 = 0.999 and ε = 10 8 with a weight decay of 10 5. We ﬁnd that for most instruments 100-150 epochs is enough for generating high quality audio, however we keep training up to 200 epochs to observe any unexpected behaviour or overﬁtting. All training is done on Tesla P100 GPUs with 16GB of memory.