Temporal Bilinear Networks for Video Action Recognition

Authors: Yanghao Li, Sijie Song, Yuqi Li, Jiaying Liu8674-8681

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we perform experiments on several widely adopted datasets including Kinetics, UCF101 and HMDB51. The effectiveness of our TBNs is validated by comprehensive ablation analyses and comparisons with various state-of-the-art methods.
Researcher Affiliation Academia Peking University lyttonhao@gmail.com, ssj940920@pku.edu.cn, liyuqi.ne@gmail.com, liujiaying@pku.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Finally, we perform experiments on several widely adopted datasets including Kinetics (Kay et al. 2017), UCF101 (Soomro, Zamir, and Shah 2012) and HMDB51 (Kuehne et al. 2011).
Dataset Splits Yes Our models are trained on the training set of Kinetics dataset from scratch... There are 80k and 5k videos in training and validation sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'cu DNN (Chetlur et al. 2014)' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Our models are trained for 150 epochs with an initial learning rate of 0.1, which is decayed by a factor of 10 after 45, 90, 125 epochs. We use SGD as the optimizer with a weight decay of 0.0005 and batch size of 384. The standard augmentation methods like random cropping and random flipping are adopted during training for all the methods. For TBNs, we set the factor number p as 20 and also adopt the Dropfactor scheme (Li et al. 2017) to mitigate overfitting.