MAU: A Motion-Aware Unit for Video Prediction and Beyond
Authors: Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yan Ye, Xiang Xinguang, Wen Gao
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the MAU on both video prediction and early action recognition tasks. Experimental results show that the MAU outperforms the state-of-the-art methods on both tasks. |
| Researcher Affiliation | Collaboration | Zheng Chang University of Chinese Academy of Sciences Institute of Computing Technology, Chinese Academy of Sciences changzheng18@mails.ucas.ac.cn Xinfeng Zhang School of Computer Science and Technology, University of Chinese Academy of Sciences xfzhang@ucas.ac.cn Shanshe Wang Institute of Digital Media, Peking University sswang@pku.edu.cn Siwei Ma Institute of Digital Media, Information Technology R&D Innovation Center, Peking University swma@pku.edu.cn Yan Ye Alibaba Group yan.ye@alibaba-inc.com Xinguang Xiang School of Computer Science and Engineering, Nanjing University of Science and Technology xgxiang@njust.edu.cn Wen Gao Institute of Digital Media, Peking University University of Chinese Academy of Sciences wgao@pku.edu.cn |
| Pseudocode | No | The paper provides mathematical formulations and a diagram (Fig. 1) of the model structure, but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the supplemental material. |
| Open Datasets | Yes | We evaluate the proposed MAU on five datasets, the Moving MNIST dataset [8], the KITTI dataset [31], the Caltech Pedestrian dataset [32], the Town Centre XVID dataset [33] and the Something Something V2 dataset [34]. |
| Dataset Splits | Yes | The training set contains 168,913 videos and the validation set consists of 24,777 videos. Table 1: Experimental settings. MAUs denotes the number of the stacked MAUs. Train and Test denotes the number of frames as the inputs and the outputs while training and testing. |
| Hardware Specification | No | The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See section 4.', but section 4 does not contain specific hardware details like GPU or CPU models. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Yolov5s model', but does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | The number of the hidden state channels of MAUs are set to 64 and the integrated convolutional operators are set with a kernel size 5 5 and stride 1. All experiments are optimized with the Adam optimizer. To stabilize the training process, we employ layer normalization operators after each integrated convolutional layer in MAUs. Table 1: Experimental settings. MAUs denotes the number of the stacked MAUs. Train and Test denotes the number of frames as the inputs and the outputs while training and testing. |