reproducibilityindex.ai

Long Short-Term Transformer for Online Action Detection

Authors: Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, Stefano Soatto

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to prior work, LSTR provides an effective and efﬁcient method to model long videos with fewer heuristics, which is validated by extensive empirical analysis. LSTR achieves state-of-the-art performance on three standard online action detection benchmarks, THUMOS 14, TVSeries, and HACS Segment.
Researcher Affiliation	Industry	Mingze Xu Yuanjun Xiong Hao Chen Xinyu Li Wei Xia Zhuowen Tu Stefano Soatto Amazon/AWS AI {xumingze,yuanjx,hxen,xxnl,wxia,ztu,soattos}@amazon.com
Pseudocode	No	The paper describes the architecture and processes of LSTR in prose and through diagrams (Figure 1, Figure 2), but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Code has been made available at: https://xumingze0308.github.io/projects/lstr.
Open Datasets	Yes	We evaluate our model on three publicly-available datasets: THUMOS 14 [30], TVSeries [14] and HACS Segment [76].
Dataset Splits	Yes	THUMOS 14 ... train on the validation set (200 untrimmed videos) and evaluate on the test set (213 untrimmed videos). ... HACS Segment ... It contains 35,300 untrimmed videos over 200 human action classes for training and 5,530 untrimmed videos for validation.
Hardware Specification	Yes	We implemented our proposed model in Py Torch [1], and performed all experiments on a system with 8 Nvidia V100 graphics cards.
Software Dependencies	No	We implemented our proposed model in Py Torch [1]. The paper mentions PyTorch but does not provide a specific version number, which is required for a reproducible description of software dependencies.
Experiment Setup	Yes	For all Transformer units, we set their number of heads as 16 and hidden units as 1024 dimensions. To learn model weights, we used the Adam [34] optimizer with weight decay 5 10 5. The learning rate was linearly increased from zero to 5 10 5 in the ﬁrst 2/5 of training iterations and then reduced to zero following a cosine function. Our models were optimized with batch size of 16, and the training was terminated after 25 epochs.