Weakly-Supervised Temporal Localization via Occurrence Count Learning

Authors: Julien Schroeter, Kirill Sidorov, David Marshall

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the effectiveness of our approach in a number of experiments (drum hit and piano onset detection in audio, digit detection in images) and demonstrate performance comparable to that of fully-supervised state-of-the-art methods, despite much weaker training requirements.
Researcher Affiliation Academia Julien Schroeter 1 Kirill Sidorov 1 David Marshall 1 1Cardiff University, United Kingdom. Correspondence to: Julien Schroeter <Schroeter J1@cardiff.ac.uk>.
Pseudocode No The paper describes the model and its components in text and mathematical formulas but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The full implementation and additional details can be found on the paper s website1. 1http://users.cs.cf.ac.uk/Schroeter J1/publications/Lo Co
Open Datasets Yes The model is evaluated on two different datasets: IDMT-SMT-Drums (Dittmar et al., 2014) and ENST Drums (Gillet & Richard, 2006). The MAPS database is used for this evaluation. As in (Hawthorne et al., 2017), the synthesized pieces are used for training, whereas the Disklavier pieces are used for testing. The well-known MNIST (Le Cun et al., 1998) dataset is used to generate samples for this experiment.
Dataset Splits No The paper discusses training and testing sets, and cross-validation, but does not explicitly define a “validation” split or set with specific percentages or counts for its models during training.
Hardware Specification No We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPU used for this research. This is the only mention of hardware, and it does not specify models or details for the experiments.
Software Dependencies No The paper mentions “LSTM (Hochreiter & Schmidhuber, 1997) or GRU (Cho et al., 2014)”, “Adam algorithm (Kingma & Ba, 2015)”, but no specific software dependencies with version numbers are listed.
Experiment Setup Yes First, the representation learning part of the network is composed of six (3 4) convolutional layers with 8 to 16 filters intertwined with max-pooling layers and Re LU activations. Secondly, the recurring unit is comprised of a 24-unit LSTM which is then directly followed by a final 16-node fully-connected prediction layer. The Lo Co-loss described in Section 4.2 is optimized using the Adam algorithm (Kingma & Ba, 2015). (Tmax: 400, kmax: 31)