Learning Features of Music From Scratch
Authors: John Thickstun, Zaid Harchaoui, Sham Kakade
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The paper defines a multi-label classification task to predict notes in musical recordings, along with an evaluation protocol, and benchmarks several machine learning architectures for this task: i) learning from spectrogram features; ii) endto-end learning with a neural net; iii) end-to-end learning with a convolutional neural net. These experiments show that end-to-end models trained for note prediction learn frequency selective filters as a low-level representation of audio. |
| Researcher Affiliation | Academia | John Thickstun1, Zaid Harchaoui2 & Sham M. Kakade1,2 1 Department of Computer Science and Engineering, 2 Department of Statistics University of Washington Seattle, WA 98195, USA {thickstn,sham}@cs.washington.edu, name@uw.edu |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper makes the Music Net dataset publicly available at http://homes.cs.washington.edu/ thickstn/musicnet.html (Footnote 2) and mentions a demo using learned features (Footnote 4), but does not explicitly state that the source code for the methodology described in the paper is provided or linked. |
| Open Datasets | Yes | This paper introduces a new large labeled dataset, Music Net, which is publicly available2 as a resource for learning feature representations of music. ... 2http://homes.cs.washington.edu/ thickstn/musicnet.html. |
| Dataset Splits | No | The paper states 'We hold out a test set of 3 recordings for all the results reported in this section' and 'We report the precision and recall corresponding to the best F1-score on validation data'. While a test set is specified and a validation set is mentioned for F1-score optimization, specific percentages, counts, or a detailed splitting methodology for training, validation, and test sets are not provided. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. It only mentions the use of the TensorFlow library. |
| Software Dependencies | No | The paper mentions software like the TensorFlow library, librosa, and mir eval, but does not specify their version numbers. It refers to 'Garritan s Personal Orchestra 4 sample library', but this is a specific library, not a general software dependency with a version. |
| Experiment Setup | Yes | The results reported in Table 3 are achieved with 500 hidden units using a receptive field of 2, 048 samples with an 8-sample stride across a window of 16, 384 samples. These features are grouped into average pools of width 16, with a stride of 8 features between pools. A max-pooling operation yields similar results. The learned representations are optimized for square loss with SGD using the Tensorflow library (Abadi et al.). |