T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition

Authors: Kun Liu, Wu Liu, Chuang Gan, Mingkui Tan, Huadong Ma

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On two challenging benchmark datasets, UCF101 and HMDB51, our method is significantly better than state-of-the-art real-time methods by over 5.4% in terms of accuracy and 2 times faster in terms of inference speed (969 frames per second), demonstrating comparable recognition performance to the state-of-the-art methods.
Researcher Affiliation Academia Kun Liu,1 Wu Liu,1 Chuang Gan,2 Mingkui Tan,3 Huadong Ma1 1Beijing Key Lab of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China; 2Tsinghua University, Beijing, China; 3South China University of Technology, Guangzhou, China Email: {liu kun, liuwu, mhd}@bupt.edu.cn; ganchuang1990@gmail.com; mingkuitan@scut.edu.cn
Pseudocode No The paper provides mathematical formulations and a system diagram, but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The source code for the complete system as well as the pre-trained models are publicly available at https://github.com/tc3d.
Open Datasets Yes We empirically evaluate our T-C3D approach on the two public benchmark datasets for action recognition: UCF101 (Soomro, Zamir, and Shah 2012) and HMDB51 (Kuehne et al. 2011).
Dataset Splits Yes For both datasets, we adopt the three standard training/testing splits provided in original works as the evaluation scheme and report the mean accuracy over these three splits.
Hardware Specification Yes As for speed evaluation, we adopt FPS as metric and conduct experiments on a CPU (E5-2640 v3) and a K40 GPU.
Software Dependencies No The paper describes the methods and algorithms used (e.g., "mini-batch stochastic gradient descent algorithm"), but it does not specify software components with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes The network parameters are learned in an end-to-end fashion with the mini-batch stochastic gradient descent algorithm, where the momentum is set to 0.9 and the batch size is set to 8. The pre-trained models on Sport-1M and Kinetics are utilized to initialize network weights. We randomly initialize the last fully connected layer and add a dropout layer after the global pooling layer with high dropout ratio (set to 0.8 in experiments) to prevent over-fitting. On UCF101, the initial learning rate is 0.005 and decreased to its 1/10 every 8,000 iterations. The whole optimization procedure is stopped at 20,000 iterations.