Pseudo-Supervised Training Improves Unsupervised Melody Segmentation
Authors: Stefan Lattner, Carlos Eduardo Cancino Chacón, Maarten Grachten
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we demonstrate that, remarkably, a substantial increase in segmentation accuracy can be obtained by not using information content estimates directly, but rather in a bootstrapping fashion. More specifically, we use information content estimates computed from a generative model of the data as a target for a feed-forward neural network that is trained to estimate the information content directly from the data. We hypothesize that the improved segmentation accuracy of this bootstrapping approach may be evidence that the generative model provides noisy estimates of the information content, which are smoothed by the feed-forward neural network, yielding more accurate information content estimates. |
| Researcher Affiliation | Academia | Stefan Lattner, Carlos Eduardo Cancino Chac on and Maarten Grachten Austrian Research Institute for Artificial Intelligence Freyung 6/6, 1010 Vienna, Austria http://www.ofai.at/research/impml |
| Pseudocode | Yes | Algorithm 1: Pseudo-supervised training Data: Set of n-grams : V = {v1, . . . , v N} 1 Train an RBM by optimizing the model parameters as log p(v | ) (6) 2 Compute the set of pseudo-targets T = {t1, . . . , t N} as tt(vt; ) = h(et | et 1 t n+1), (7) where vt is the encoding of the n-gram {et n+1, . . . , et}, and h(et | et 1 t n+1) is the IC computed as in Eq. (5). 3 Build a three layered FFNN and optimize it in a supervised way, using the set of pseudo-targets T as kt(vt; ) y(vt; )k2, (8) where y(vt; ) is the output of the FFNN for vt given the model parameters . 4 return Model parameters ˆ |
| Open Source Code | No | The paper does not provide a direct link or explicit statement about the public release of its source code. It only mentions thanking Marcus Pearce for sharing the Essen data. |
| Open Datasets | Yes | For training and testing, we use the Essen Folk Song Collection (EFSC) [Schaffrath, 1995], a widely used corpus in music information retrieval (MIR). |
| Dataset Splits | Yes | For each n-gram length, we perform 5-fold crossvalidation and average the results over all folds. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific CPU or GPU models. |
| Software Dependencies | No | The paper mentions using a 'three layered FFNN with sigmoid hidden units' and training with 'PCD' and 'Backpropagation', but it does not specify software dependencies with version numbers (e.g., Python, specific libraries like TensorFlow or PyTorch, or their versions). |
| Experiment Setup | Yes | For the initial IC estimation, we use 200 hidden units, a momentum of 0.6, and a learning rate of 0.0085 which we linearly decrease to zero during training. With increasing n-gram length we linearly adapt the batch size from 250 to 1000. In addition, we use 50% dropout on the hidden layer and 20% dropout on the visible layer. The fast weights used in the training algorithm (see Section 3.1) help the fantasy particles mix well, even with small learning rates. The learning rate of the fast weights is increased from 0.002 to 0.007 during training. The training is continued until convergence of the parameters (typically between 100 and 300 epochs). The sparsity parameters (see Section 3.1) are set to µ = 0.04, and φ = 0.65, respectively. In addition, we use a value of 0.0035 for L2 weight regularization, which penalizes large weight coefficients. For pre-training of the first layer in the FFNN, we change the learning rate to 0.005, leave the batch size constant at 250 and increase the weight regularization to 0.01. We again use dropout, for both the pre-training and the fine-tuning. |