Finding Action Tubes with a Sparse-to-Dense Framework

Authors: Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Limin Wang, Shugong Xu11466-11473

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods.
Researcher Affiliation Collaboration Yuxi Li,1 Weiyao Lin,1,2 Tao Wang,1 John See,3 Rui Qian,1 Ning Xu,4 Limin Wang,5 Shugong Xu2 1School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, China 2Shanghai Institute for Advanced Communication and Data Science, Shanghai University, China 3Faculty of Computing and Informatics, Multimedia University, Malaysia 4Adobe Research, USA 5State Key Laboratory for Novel Software Technology, Nanjing University, China
Pseudocode Yes Algorithm 1: Ground-truth dynamic level generation
Open Source Code No The related project page will be posted at: http://min.sjtu.edu.cn/lwydemo/Tube.html. This is a promise for future release or a project page, not a direct, immediate, and concrete access to the source code for the methodology.
Open Datasets Yes We conduct our experiment on three common datasets UCF101-24, UCFSports and JHMDB-21 datasets. ... The UCF101-24 dataset (Soomro, Zamir, and Shah 2012) contains 3,207 untrimmed videos... The JHMDB-21 is a subset of HMDB-51 dataset (Jhuang et al. 2013)... The UCFSports dataset (Rodriguez, Ahmed, and Shah 2008) contains 150 videos of 10 sport action classes.
Dataset Splits Yes Following previous works (Saha et al. 2016), we report results for the first split. ... The results are reported as the average performance over 3 train-test splits. ... We report the result on the standard split.
Hardware Specification Yes Our model is implemented on an NVIDIA Titan 1080 GPU.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are mentioned. Only the I3D network is cited as a feature extractor.
Experiment Setup Yes During training, we use length T of videos sampled to 96 frames for UCF101, 32 frames for JHMDB-21 and 48 frames for UCFSports. We set the hyperparameters as: α = 2, ϵ = 0.7, γ = 0.1. ... We use the SGD solver and train our network with an accumulative batch size of 8.