Trajectory Convolution for Action Recognition
Authors: Yue Zhao, Yuanjun Xiong, Dahua Lin
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Something-Something V1 and Kinetics datasets show that by explicitly taking into account the motion dynamics in the temporal operation, the proposed network obtains considerable improvements over the Separable-3D, a competitive baseline. To evaluate the effectiveness of our Trajectory Net, we conduct experiments on two benchmark datasets for action recognition: Something-Something V1 [8] and Kinetics [19]. |
| Researcher Affiliation | Collaboration | Yue Zhao Department of Information Engineering The Chinese University of Hong Kong zy317@ie.cuhk.edu.hk Yuanjun Xiong Amazon Rekognition yuanjx@amazon.com Dahua Lin Department of Information Engineering The Chinese University of Hong Kong dhlin@ie.cuhk.edu.hk |
| Pseudocode | No | The paper describes algorithms using mathematical formulations and textual descriptions but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the source code of the methodology described. |
| Open Datasets | Yes | Something-Something V1 [8] is a large-scale crowd-sourced video dataset on human-object interaction. It contains 108,499 video clips in 174 classes. Kinetics [19] is a large-scale video dataset on human-centric activities sourced from You Tube. |
| Dataset Splits | Yes | The dataset is split into train, validation and test subset in the ratio of around 8:1:1. (Something-Something V1) and our version contains 240436, 19796 and 38685 clips in the training, validation and test subset, respectively. (Kinetics) |
| Hardware Specification | Yes | The network is tested on a workstation with Intel(R) Xeon(R) CPU (E5-2640 v3 @2.60GHz) and Nvidia Titan X GPU. |
| Software Dependencies | No | The paper mentions 'Open CV with CUDA' for the TV-L1 algorithm but does not specify version numbers for key software components or libraries. |
| Experiment Setup | Yes | The length of each input clip is 16 and the sampling step varies from 1 to 2. For Something-Something V1, the batch size is set to 64 while for Kinetics, the batch size is 128. On Kinetics, the network is trained from an initial learning rate of 0.01 and is reduced by 1/10 every 40 epochs. The whole training procedure takes 100 epochs. |