VideoCapsuleNet: A Simplified Network for Action Detection

Authors: Kevin Duarte, Yogesh Rawat, Mubarak Shah

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed network achieves state-of-the-art performance on multiple action detection datasets including UCF-Sports, J-HMDB, and UCF-101 (24 classes) with an impressive 20% improvement on UCF-101 and 15% improvement on J-HMDB in terms of v-m AP scores.
Researcher Affiliation Academia Kevin Duarte kevin_duarte@knights.ucf.edu Yogesh S Rawat yogesh@crcv.ucf.edu Mubarak Shah shah@crcv.ucf.edu Center for Research in Computer Vision University of Central Florida Orlando, FL 32816
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets Yes We measure the performance of our network on three datasets UCF-Sports [15], J-HMDB [16], UCF-101 [17].
Dataset Splits Yes The UCF-Sports dataset consists of 150 videos from 10 action classes. All videos contain spatio-temporal annotations in the form of frame-level bounding boxes and we follow the standard training/testing split used by [21].
Hardware Specification Yes Although capsule networks tend to be computationally expensive (due to the routing-by-agreement), capsule-pooling allows Video Capsule Net to run on a single Titan X GPU using a batch size of 8.
Software Dependencies No We implement Video Capsule Net using Tensorflow [12].
Experiment Setup Yes The network was trained using the Adam optimizer [14], with a learning rate of 0.0001. Due to the size of the Video Capsule Net, a batch size of 8 was used during training.