Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Two-Stream Convolutional Networks for Action Recognition in Videos
Authors: Karen Simonyan, Andrew Zisserman
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art. |
| Researcher Affiliation | Academia | Karen Simonyan Andrew Zisserman Visual Geometry Group, University of Oxford EMAIL |
| Pseudocode | No | The paper describes methods and architectures in text and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor structured, code-like steps. |
| Open Source Code | No | Our implementation is derived from the publicly available Caffe toolbox [13], but contains a number of significant modifications, including parallel training on multiple GPUs installed in a single system. (No explicit statement of their code being released or a link to it.) |
| Open Datasets | Yes | The evaluation is performed on UCF-101 [24] and HMDB-51 [16] action recognition benchmarks, which are among the largest available annotated video datasets |
| Dataset Splits | No | The evaluation protocol is the same for both datasets: the organisers provide three splits into training and test data, and the performance is measured by the mean classification accuracy across the splits. (The paper only explicitly states train/test splits provided by organizers, not a specific validation split for their own experiments on UCF/HMDB, though a validation set is mentioned for ImageNet pre-training and implicit for fine-tuning.) |
| Hardware Specification | Yes | Training a single temporal Conv Net takes 1 day on a system with 4 NVIDIA Titan cards, which constitutes a 3.2 times speed-up over single-GPU training. |
| Software Dependencies | No | Our implementation is derived from the publicly available Caffe toolbox [13]... Optical flow is computed using the off-the-shelf GPU implementation of [2] from the Open CV toolbox. (Specific version numbers for Caffe or OpenCV are not provided, only the names of the toolboxes.) |
| Experiment Setup | Yes | The network weights are learnt using the mini-batch stochastic gradient descent with momentum (set to 0.9). At each iteration, a mini-batch of 256 samples is constructed... The learning rate is initially set to 10^-2, and then decreased according to a fixed schedule... when training a Conv Net from scratch, the rate is changed to 10^-3 after 50K iterations, then to 10^-4 after 70K iterations, and training is stopped after 80K iterations. In the fine-tuning scenario, the rate is changed to 10^-3 after 14K iterations, and training stopped after 20K iterations. |