Ultrafast Video Attention Prediction with Coupled Knowledge Distillation

Authors: Kui Fu, Peipei Shi, Yafei Song, Shiming Ge, Xiangju Lu, Jia Li10802-10809

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the performance of our model is comparable to 11 state-of-the-art models in video attention prediction, while it costs only 0.68 MB memory footprint, runs about 10,106 FPS on GPU and 404 FPS on CPU, which is 206 times faster than previous models. Comprehensive experiments are conducted and illustrate that our model can achieve an ultrafast speed with a comparable attention prediction accuracy to the state-of-the-art models.
Researcher Affiliation Collaboration 1State Key Laboratory of Virtual Reality Technology and Systems, SCSE, Beihang University 2i QIYI, Inc 3National Engineering Laboratory for Video Technology, School of EE&CS, Peking University 4Institute of Information Engineering, Chinese Academy of Sciences
Pseudocode No The paper contains figures illustrating network architectures and equations, but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We evaluate the proposed UVA-Net on a public dataset AVS1K (Fu et al. 2018), which is an aerial video dataset for attention prediction. On DHF1K, our model is compared with ten state-of-the-art models, including: PQFT (Guo and Zhang 2010), Seo et al. (2009), Rudoy et al. (2013), Hou et al. (2009), Fang et al. (2014), OBDL (Hossein Khatoonabadi et al. 2015), AWS-D (Leboran et al. 2016), OMCNN (Jiang et al. 2018), Two-stream (Bak et al. 2017).
Dataset Splits No The paper mentions training and testing on datasets like AVS1K and DHF1K, and a footnote for Table 2 says "Models test on the validation set of DHF1K", but it does not provide specific details on the train/validation/test splits (e.g., percentages or sample counts) for their experiments, nor does it refer to a standard split with a citation that defines it.
Hardware Specification Yes All networks proposed in this paper are implemented with Tensorflow (Abadi et al. 2016) on an NVIDIA GPU 1080Ti and a six-core CPU Intel 3.4GHz.
Software Dependencies No The paper mentions using "Tensorflow (Abadi et al. 2016)" but does not specify a version number for Tensorflow or any other software dependencies.
Experiment Setup Yes In the knowledge distillation step, S is trained from scratch with a learning rate of 5 10 4 and a batch size of 96. Adam (Kingma and Ba 2014) is employed to minimize the spatial loss Lspa and temporal loss Ltmp.