Weakly-Supervised Temporal Localization via Occurrence Count Learning
Authors: Julien Schroeter, Kirill Sidorov, David Marshall
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements. |
| Researcher Affiliation | Academia | Julien Schroeter 1 Kirill Sidorov 1 David Marshall 1 1Cardiff University, United Kingdom. Correspondence to: Julien Schroeter <Schroeter J1@cardiff.ac.uk>. |
| Pseudocode | No | The paper describes the model and its components in text and mathematical formulas but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The full implementation and additional details can be found on the paper s website1. 1http://users.cs.cf.ac.uk/Schroeter J1/publications/Lo Co |
| Open Datasets | Yes | The model is evaluated on two different datasets: IDMT-SMT-Drums (Dittmar et al., 2014) and ENST Drums (Gillet & Richard, 2006). The MAPS database is used for this evaluation. As in (Hawthorne et al., 2017), the synthesized pieces are used for training, whereas the Disklavier pieces are used for testing. The well-known MNIST (Le Cun et al., 1998) dataset is used to generate samples for this experiment. |
| Dataset Splits | No | The paper discusses training and testing sets, and cross-validation, but does not explicitly define a “validation” split or set with specific percentages or counts for its models during training. |
| Hardware Specification | No | We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPU used for this research. This is the only mention of hardware, and it does not specify models or details for the experiments. |
| Software Dependencies | No | The paper mentions “LSTM (Hochreiter & Schmidhuber, 1997) or GRU (Cho et al., 2014)”, “Adam algorithm (Kingma & Ba, 2015)”, but no specific software dependencies with version numbers are listed. |
| Experiment Setup | Yes | First, the representation learning part of the network is composed of six (3 4) convolutional layers with 8 to 16 filters intertwined with max-pooling layers and Re LU activations. Secondly, the recurring unit is comprised of a 24-unit LSTM which is then directly followed by a final 16-node fully-connected prediction layer. The Lo Co-loss described in Section 4.2 is optimized using the Adam algorithm (Kingma & Ba, 2015). (Tmax: 400, kmax: 31) |