V4D: 4D Convolutional Neural Networks for Video-level Representation Learning

Authors: Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Limin Wang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.
Researcher Affiliation Collaboration Shiwen Zhang, Sheng Guo, Weilin Huang & Matthew R. Scott Malong Technologies, Shenzhen, China Shenzhen Malong Artificial Intelligence Research Center, Shenzhen, China {shizhang,sheng,whuang,mscott}@malong.com Limin Wang State Key Laboratory for Novel Software Technology, Nanjing University, China lmwang@nju.edu.cn
Pseudocode Yes Algorithm 1: V4D Inference. Networks :The structure of networks is divided into two sub-networks by the first 4D Block, namely N3D and N4D. Input :Uinfer action units from a holistic video: {A1, A2, ..., AUinfer}. Output :The video-level prediction.
Open Source Code No The paper does not provide any explicit statements about the release of source code, nor does it include links to a code repository for the described methodology.
Open Datasets Yes We conduct experiments on three standard benchmarks: Mini-Kinetics (Xie et al., 2018), Kinetics-400 (Carreira & Zisserman, 2017), and Something-Something-v1 (Goyal et al., 2017).
Dataset Splits Yes Our version of Kinetics-400 contains 240,436 and 19,796 videos in the training subset and validation subset, respectively. Our version of Mini-kinetics contains 78,422 videos for training, and 4,994 videos for validation. Each video has around 300 frames. Something Something-v1 contains 108,499 videos totally, with 86,017 for training, 11,522 for validation, and 10,960 for testing.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes We utilize a SGD optimizer with an initial learning rate of 0.01, weight decay is set to 10 5 with a momentum of 0.9. The learning rate drops by 10 at epoch 35, 60, 80, and the model is trained for 100 epochs in total.