FASTER Recurrent Networks for Efficient Video Classification
Authors: Linchao Zhu, Du Tran, Laura Sevilla-Lara, Yi Yang, Matt Feiszli, Heng Wang13098-13105
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our FASTER framework has significantly better accuracy/FLOPs trade-offs, achieving the state-of-the-art accuracy with 10 less FLOPs. |
| Researcher Affiliation | Collaboration | Facebook AI Re LER, University of Technology Sydney University of Edinburgh {linchao.zhu, yi.yang}@uts.edu.au, lsevilla@ed.ac.uk, {trandu, mdf, hengwang}@fb.com |
| Pseudocode | No | The paper describes the architecture and equations for FAST-GRU, but does not provide a formally structured pseudocode or algorithm block. |
| Open Source Code | No | The paper mentions that the R(2+1)D backbone code is available at 'https://github.com/facebookresearch/VMZ', but does not state that the code for their proposed FASTER framework is open-source or publicly available. |
| Open Datasets | Yes | We choose the Kinetics (Kay et al. 2017) dataset as the major testbed for FASTER. ... We also report results on UCF-101 (Soomro, Zamir, and Shah 2012) and HMDB-51 (Kuehne et al. 2011). |
| Dataset Splits | No | The paper states that for Kinetics, 'We report top-1 accuracy on the validation set as labels on the testing set is not public available.' and for UCF-101 and HMDB-51, 'we use Kinetics for pre-training and report mean accuracy on three testing splits.' While it implies the existence of training, validation, and testing sets, it does not provide specific percentages, counts, or explicit details of the train/validation/test splits needed for reproduction. |
| Hardware Specification | Yes | We measure the runtime speed of different methods on a TITAN X GPU with an Intel i7 CPU. |
| Software Dependencies | No | The paper mentions various models and techniques (e.g., CNNs, RNNs, GRU, LSTM, ResNet, SoftMax loss, Batch Normalization, ReLU) and a learning rate schedule, but does not provide specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | Setups for clip-level backbones. We mostly follow the procedure in (Tran et al. 2018) to train the clip-level backbones except the following two changes. First, we scale the input video whose shorter side is randomly sampled in [256, 320] pixels, following (Wang et al. 2018). Second, we adopt the cosine learning rate schedule (Loshchilov and Hutter 2016). During training, we randomly sample L consecutive frames from a given video. ... We fix the total number of frames processed to be 256, i.e., N L = 256. |