FASTER Recurrent Networks for Efficient Video Classification

Authors: Linchao Zhu, Du Tran, Laura Sevilla-Lara, Yi Yang, Matt Feiszli, Heng Wang13098-13105

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our FASTER framework has significantly better accuracy/FLOPs trade-offs, achieving the state-of-the-art accuracy with 10 less FLOPs.
Researcher Affiliation Collaboration Facebook AI Re LER, University of Technology Sydney University of Edinburgh {linchao.zhu, yi.yang}@uts.edu.au, lsevilla@ed.ac.uk, {trandu, mdf, hengwang}@fb.com
Pseudocode No The paper describes the architecture and equations for FAST-GRU, but does not provide a formally structured pseudocode or algorithm block.
Open Source Code No The paper mentions that the R(2+1)D backbone code is available at 'https://github.com/facebookresearch/VMZ', but does not state that the code for their proposed FASTER framework is open-source or publicly available.
Open Datasets Yes We choose the Kinetics (Kay et al. 2017) dataset as the major testbed for FASTER. ... We also report results on UCF-101 (Soomro, Zamir, and Shah 2012) and HMDB-51 (Kuehne et al. 2011).
Dataset Splits No The paper states that for Kinetics, 'We report top-1 accuracy on the validation set as labels on the testing set is not public available.' and for UCF-101 and HMDB-51, 'we use Kinetics for pre-training and report mean accuracy on three testing splits.' While it implies the existence of training, validation, and testing sets, it does not provide specific percentages, counts, or explicit details of the train/validation/test splits needed for reproduction.
Hardware Specification Yes We measure the runtime speed of different methods on a TITAN X GPU with an Intel i7 CPU.
Software Dependencies No The paper mentions various models and techniques (e.g., CNNs, RNNs, GRU, LSTM, ResNet, SoftMax loss, Batch Normalization, ReLU) and a learning rate schedule, but does not provide specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes Setups for clip-level backbones. We mostly follow the procedure in (Tran et al. 2018) to train the clip-level backbones except the following two changes. First, we scale the input video whose shorter side is randomly sampled in [256, 320] pixels, following (Wang et al. 2018). Second, we adopt the cosine learning rate schedule (Loshchilov and Hutter 2016). During training, we randomly sample L consecutive frames from a given video. ... We fix the total number of frames processed to be 256, i.e., N L = 256.