MAU: A Motion-Aware Unit for Video Prediction and Beyond

Authors: Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xiang Xinguang, Wen Gao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the MAU on both video prediction and early action recognition tasks. Experimental results show that the MAU outperforms the state-of-the-art methods on both tasks.
Researcher Affiliation Collaboration Zheng Chang University of Chinese Academy of Sciences Institute of Computing Technology, Chinese Academy of Sciences changzheng18@mails.ucas.ac.cn Xinfeng Zhang School of Computer Science and Technology, University of Chinese Academy of Sciences xfzhang@ucas.ac.cn Shanshe Wang Institute of Digital Media, Peking University sswang@pku.edu.cn Siwei Ma Institute of Digital Media, Information Technology R&D Innovation Center, Peking University swma@pku.edu.cn Yan Ye Alibaba Group yan.ye@alibaba-inc.com Xinguang Xiang School of Computer Science and Engineering, Nanjing University of Science and Technology xgxiang@njust.edu.cn Wen Gao Institute of Digital Media, Peking University University of Chinese Academy of Sciences wgao@pku.edu.cn
Pseudocode No The paper provides mathematical formulations and a diagram (Fig. 1) of the model structure, but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the supplemental material.
Open Datasets Yes We evaluate the proposed MAU on five datasets, the Moving MNIST dataset [8], the KITTI dataset [31], the Caltech Pedestrian dataset [32], the Town Centre XVID dataset [33] and the Something Something V2 dataset [34].
Dataset Splits Yes The training set contains 168,913 videos and the validation set consists of 24,777 videos. Table 1: Experimental settings. MAUs denotes the number of the stacked MAUs. Train and Test denotes the number of frames as the inputs and the outputs while training and testing.
Hardware Specification No The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See section 4.', but section 4 does not contain specific hardware details like GPU or CPU models.
Software Dependencies No The paper mentions 'Adam optimizer' and 'Yolov5s model', but does not provide specific software dependencies with version numbers.
Experiment Setup Yes The number of the hidden state channels of MAUs are set to 64 and the integrated convolutional operators are set with a kernel size 5 5 and stride 1. All experiments are optimized with the Adam optimizer. To stabilize the training process, we employ layer normalization operators after each integrated convolutional layer in MAUs. Table 1: Experimental settings. MAUs denotes the number of the stacked MAUs. Train and Test denotes the number of frames as the inputs and the outputs while training and testing.