Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning End-to-end Video Classification with Rank-Pooling
Authors: Basura Fernando, Stephen Gould
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our approach on action and activity recognition tasks. We conduct experiments on action and activity recognition tasks in video using two real-world datasets, and compare our approach against some strong baseline methods. |
| Researcher Affiliation | Academia | Basura Fernando EMAIL Research School of Engineering, The Australian National University, ACT 2601, Australia Stephen Gould EMAIL Research School of Computer Science, The Australian National University, ACT 2601, Australia |
| Pseudocode | No | The paper does not contain a pseudocode block or a clearly labeled algorithm. |
| Open Source Code | No | The paper mentions using 'publicly available code (Fernando et al., 2015)' for a baseline method, but it does not state that the source code for the methodology described in *this* paper is open-source or provide a link. |
| Open Datasets | Yes | First, we use UCF-sports dataset (Rodriguez et al., 2008) for the task of action classification. Second, we use the Hollywood2 dataset (Laptev et al., 2008) for the task of activity recognition. |
| Dataset Splits | Yes | We use provided train-test splits for training and testing. It has 1,707 videos in total with a pre-defined split of 823 training videos and 884 test videos. |
| Hardware Specification | Yes | Using the full gradient optimization is ten times slower than the approximate method, resulting in pro-cessing videos at 5 frames per second versus 50 frames per second (for the approximate method) during training on a Titan-X GPU. |
| Software Dependencies | No | The paper mentions software like 'Caffe reference model' and 'Mat Conv Net', but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We initialize the network with the Caffe reference model and use a variable learning rate starting from 0.01 down to 0.0001 over 60 epochs. We also use a weight decay of 0.0005 on an L2-regularizer over the model parameters. |