Learning End-to-end Video Classification with Rank-Pooling
Authors: Basura Fernando, Stephen Gould
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our approach on action and activity recognition tasks. We conduct experiments on action and activity recognition tasks in video using two real-world datasets, and compare our approach against some strong baseline methods. |
| Researcher Affiliation | Academia | Basura Fernando BASURA.FERNANDO@ANU.EDU.AU Research School of Engineering, The Australian National University, ACT 2601, Australia Stephen Gould STEPHEN.GOULD@ANU.EDU.AU Research School of Computer Science, The Australian National University, ACT 2601, Australia |
| Pseudocode | No | The paper does not contain a pseudocode block or a clearly labeled algorithm. |
| Open Source Code | No | The paper mentions using 'publicly available code (Fernando et al., 2015)' for a baseline method, but it does not state that the source code for the methodology described in *this* paper is open-source or provide a link. |
| Open Datasets | Yes | First, we use UCF-sports dataset (Rodriguez et al., 2008) for the task of action classification. Second, we use the Hollywood2 dataset (Laptev et al., 2008) for the task of activity recognition. |
| Dataset Splits | Yes | We use provided train-test splits for training and testing. It has 1,707 videos in total with a pre-defined split of 823 training videos and 884 test videos. |
| Hardware Specification | Yes | Using the full gradient optimization is ten times slower than the approximate method, resulting in pro-cessing videos at 5 frames per second versus 50 frames per second (for the approximate method) during training on a Titan-X GPU. |
| Software Dependencies | No | The paper mentions software like 'Caffe reference model' and 'Mat Conv Net', but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We initialize the network with the Caffe reference model and use a variable learning rate starting from 0.01 down to 0.0001 over 60 epochs. We also use a weight decay of 0.0005 on an L2-regularizer over the model parameters. |