Temporal Interlacing Network

Authors: Hao Shao, Shengju Qian, Yu Liu11966-11973

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically prove that with a learnable interlacing target, TIN performs equivalently to the regularized temporal convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on 6 challenging benchmarks. Exhaustive experiments further demonstrate the proposed TIN gains 4% more accuracy with 6x less latency, and finally be the new state-of-the-art method. In this section, we demonstrate the effectiveness of the proposed TIN on many video datasets. We first introduce the datasets used in our experiments. Then we provide a quantitative analysis with 2D CNN baseline and TSM (Lin, Gan, and Han 2018). We also perform comparisons with the sota results on the dataset Something (V1 & V2). To conclude, we conduct ablation experiments about our TIN and study the functionality of our design.
Researcher Affiliation Collaboration Hao Shao,1,2 Shengju Qian,3 Yu Liu3 1Tsinghua University, 2Sense Time X-Lab, 3The Chinese University of Hong Kong
Pseudocode No No explicit pseudocode or algorithm blocks are provided in the paper; the method is described using prose and mathematical equations.
Open Source Code Yes Code is made available to facilitate further research.1 https://github.com/deepcs233/TIN
Open Datasets Yes Datasets we conduct experiments on six video recognition datasets, including Something-Something (V1 & V2) (Goyal et al. 2017), Kinetics-600 (Carreira and Zisserman 2017) (Carreira et al. 2018), UCF101 (Soomro, Zamir, and Shah 2012), HMDB51 (Carreira and Zisserman 2017), Multi-Moments in Time (Monfort et al. 2019) and Jester datasets.
Dataset Splits No The paper mentions evaluating on 'validation' sets (e.g., 'Val Top-1 Val Top-5' in tables) for datasets like Something V1 and Kinetics-600, and refers to using 'training videos' for Kinetics-600, but it does not explicitly state the specific training/validation/test split percentages or sample counts for any of the datasets used.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using a 'mini-batch stochastic gradient descent' optimizer and pre-trained models, but it does not provide specific version numbers for any software components, libraries, or programming languages used (e.g., Python version, PyTorch version).
Experiment Setup Yes We set the dropout rate (Srivastava et al. 2014) to 0.5 and set weight decay to 5e-4. We use the mini-batch stochastic gradient descent (Bottou 2010) algorithm with a momentum of 0.9 as our optimizer. The initial learning rate is set to 0.005 and divided by 10 at 10, 20 epochs, which stops at 25 epochs.