Learning End-to-end Video Classification with Rank-Pooling

Authors: Basura Fernando, Stephen Gould

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our approach on action and activity recognition tasks. We conduct experiments on action and activity recognition tasks in video using two real-world datasets, and compare our approach against some strong baseline methods.
Researcher Affiliation Academia Basura Fernando BASURA.FERNANDO@ANU.EDU.AU Research School of Engineering, The Australian National University, ACT 2601, Australia Stephen Gould STEPHEN.GOULD@ANU.EDU.AU Research School of Computer Science, The Australian National University, ACT 2601, Australia
Pseudocode No The paper does not contain a pseudocode block or a clearly labeled algorithm.
Open Source Code No The paper mentions using 'publicly available code (Fernando et al., 2015)' for a baseline method, but it does not state that the source code for the methodology described in *this* paper is open-source or provide a link.
Open Datasets Yes First, we use UCF-sports dataset (Rodriguez et al., 2008) for the task of action classification. Second, we use the Hollywood2 dataset (Laptev et al., 2008) for the task of activity recognition.
Dataset Splits Yes We use provided train-test splits for training and testing. It has 1,707 videos in total with a pre-defined split of 823 training videos and 884 test videos.
Hardware Specification Yes Using the full gradient optimization is ten times slower than the approximate method, resulting in pro-cessing videos at 5 frames per second versus 50 frames per second (for the approximate method) during training on a Titan-X GPU.
Software Dependencies No The paper mentions software like 'Caffe reference model' and 'Mat Conv Net', but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We initialize the network with the Caffe reference model and use a variable learning rate starting from 0.01 down to 0.0001 over 60 epochs. We also use a weight decay of 0.0005 on an L2-regularizer over the model parameters.