A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization

Authors: Ashraful Islam, Chengjiang Long, Richard Radke1637-1645

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on two popular action localization datasets: THUMOS14 (Jiang et al. 2014) and Activity Net1.2 (Caba Heilbron et al. 2015). Table 2 summarizes performance comparisons between our proposed HAM-Net and state-of-the-art fully-supervised and weakly-supervised TAL methods on the THUMOS14 dataset.
Researcher Affiliation Collaboration Ashraful Islam1, Chengjiang Long 2, Richard Radke 1 1 Rensselaer Polytechnic Institute 2 JD Digits AI Lab
Pseudocode No The paper describes its method using mathematical equations and a diagram (Figure 2), but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets Yes We evaluate our approach on two popular action localization datasets: THUMOS14 (Jiang et al. 2014) and Activity Net1.2 (Caba Heilbron et al. 2015).
Dataset Splits Yes THUMOS14 contains 200 validation videos for training and 213 testing videos for testing with 20 action categories. Activity Net1.2 dataset contains 4,819 videos for training and 2,382 videos for testing with 200 action classes. During training we randomly sample 500 snippets for THUMOS14 and 80 snippets for Activity Net, and during evaluation we take all the snippets.
Hardware Specification No The paper mentions using an 'I3D network' for feature extraction but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Adam (Kingma and Ba 2015) optimizer' and 'I3D network (Carreira and Zisserman 2017)' but does not specify version numbers for these or any other software components (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We use the Adam (Kingma and Ba 2015) optimizer with learning rate 0.00001, and train for 100 epochs for THUMOS14 and 20 epochs for Activity Net. For THUMOS14, we set λ0 = λ1 = 0.8, λ2 = λ3 = 0.2, α = β = 0.8, γ = 0.2, and k = 50 for top-k temporal pooling. For Activity Net, we set α = 0.5, β = 0.1, λ0 = λ1 = λ2 = λ3 = 0.5, and k = 4, and apply additional average pooling to postprocess the final CAS. All the hyperparameters are determined from grid search.