V4D: 4D Convolutional Neural Networks for Video-level Representation Learning
Authors: Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Limin Wang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin. |
| Researcher Affiliation | Collaboration | Shiwen Zhang, Sheng Guo, Weilin Huang & Matthew R. Scott Malong Technologies, Shenzhen, China Shenzhen Malong Artificial Intelligence Research Center, Shenzhen, China {shizhang,sheng,whuang,mscott}@malong.com Limin Wang State Key Laboratory for Novel Software Technology, Nanjing University, China lmwang@nju.edu.cn |
| Pseudocode | Yes | Algorithm 1: V4D Inference. Networks :The structure of networks is divided into two sub-networks by the first 4D Block, namely N3D and N4D. Input :Uinfer action units from a holistic video: {A1, A2, ..., AUinfer}. Output :The video-level prediction. |
| Open Source Code | No | The paper does not provide any explicit statements about the release of source code, nor does it include links to a code repository for the described methodology. |
| Open Datasets | Yes | We conduct experiments on three standard benchmarks: Mini-Kinetics (Xie et al., 2018), Kinetics-400 (Carreira & Zisserman, 2017), and Something-Something-v1 (Goyal et al., 2017). |
| Dataset Splits | Yes | Our version of Kinetics-400 contains 240,436 and 19,796 videos in the training subset and validation subset, respectively. Our version of Mini-kinetics contains 78,422 videos for training, and 4,994 videos for validation. Each video has around 300 frames. Something Something-v1 contains 108,499 videos totally, with 86,017 for training, 11,522 for validation, and 10,960 for testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | We utilize a SGD optimizer with an initial learning rate of 0.01, weight decay is set to 10 5 with a momentum of 0.9. The learning rate drops by 10 at epoch 35, 60, 80, and the model is trained for 100 epochs in total. |