reproducibilityindex.ai

Two-Stream Convolutional Networks for Action Recognition in Videos

Authors: Karen Simonyan, Andrew Zisserman

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art.
Researcher Affiliation	Academia	Karen Simonyan Andrew Zisserman Visual Geometry Group, University of Oxford {karen,az}@robots.ox.ac.uk
Pseudocode	No	The paper describes methods and architectures in text and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured, code-like steps.
Open Source Code	No	Our implementation is derived from the publicly available Caffe toolbox [13], but contains a number of signiﬁcant modiﬁcations, including parallel training on multiple GPUs installed in a single system. (No explicit statement of their code being released or a link to it.)
Open Datasets	Yes	The evaluation is performed on UCF-101 [24] and HMDB-51 [16] action recognition benchmarks, which are among the largest available annotated video datasets
Dataset Splits	No	The evaluation protocol is the same for both datasets: the organisers provide three splits into training and test data, and the performance is measured by the mean classiﬁcation accuracy across the splits. (The paper only explicitly states train/test splits provided by organizers, not a specific validation split for their own experiments on UCF/HMDB, though a validation set is mentioned for ImageNet pre-training and implicit for fine-tuning.)
Hardware Specification	Yes	Training a single temporal Conv Net takes 1 day on a system with 4 NVIDIA Titan cards, which constitutes a 3.2 times speed-up over single-GPU training.
Software Dependencies	No	Our implementation is derived from the publicly available Caffe toolbox [13]... Optical ﬂow is computed using the off-the-shelf GPU implementation of [2] from the Open CV toolbox. (Specific version numbers for Caffe or OpenCV are not provided, only the names of the toolboxes.)
Experiment Setup	Yes	The network weights are learnt using the mini-batch stochastic gradient descent with momentum (set to 0.9). At each iteration, a mini-batch of 256 samples is constructed... The learning rate is initially set to 10^-2, and then decreased according to a ﬁxed schedule... when training a Conv Net from scratch, the rate is changed to 10^-3 after 50K iterations, then to 10^-4 after 70K iterations, and training is stopped after 80K iterations. In the ﬁne-tuning scenario, the rate is changed to 10^-3 after 14K iterations, and training stopped after 20K iterations.