Learning to encode motion using spatio-temporal synchrony
Authors: Kishore Reddy Konda; Roland Memisevic; Vincent Michalski
ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This makes it possible to achieve competitive performance in a wide variety of motion estimation tasks, using a small fraction of the time required to learn features, and to outperform hand-crafted spatio-temporal features by a large margin. 4. Application to activity recognition |
| Researcher Affiliation | Academia | Kishore Konda KONDA@INFORMATIK.UNI-FRANKFURT.DE Goethe University Frankfurt, Frankfurt, Roland Memisevic ROLAND.MEMISEVIC@UMONTREAL.CA University of Montreal, Montreal, Vincent Michalski VMICHALS@RZ.UNI-FRANKFURT.DE Goethe University Frankfurt, Frankfurt |
| Pseudocode | No | The paper presents mathematical equations for the learning rules (e.g., Equations 6-15) but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that the source code for their methodology is provided or available, nor does it provide a link to a repository. |
| Open Datasets | Yes | We evaluated our models on several popular activity recognition benchmark datasets: KTH (Schuldt et al., 2004): Six actions performed by 25 subjects. UCF sports(Rodriguez et al., 2008): Ten action classes. Hollywood2 (Marszałek et al., 2009): Twelve activity classes. YUPENN dynamic scenes (Derpanis, 2012): Fourteen scene categories... |
| Dataset Splits | Yes | KTH (Schuldt et al., 2004): Six actions performed by 25 subjects. Samples divided into train and test data according to the authors original split. UCF sports(Rodriguez et al., 2008): Ten action classes...we use leave-one-out for evaluation. Hollywood2 (Marszałek et al., 2009): Twelve activity classes. It consists of 884 test samples and 823 train samples... |
| Hardware Specification | Yes | All experiments were performed on a system with a 3.20GHz CPU, 24GB RAM and a GTX 680 GPU. |
| Software Dependencies | No | The paper mentions using 'the theano library (Bergstra et al., 2010)' for GPU implementations, but does not provide specific version numbers for Theano or other software dependencies. |
| Experiment Setup | Yes | We train our models on pca-whitened input patches of size 10 16 16. The number of training samples is 200, 000. The number of product units are fixed at 300. For inference sub blocks of the same size as the patch size are cropped from super blocks of size 14 20 20 (Le et al., 2011). The sub blocks are cropped with a stride of 4 on each axis giving 8 sub blocks per super block. The feature responses of sub blocks are concatenated and dimensionally reduced using PCA to form the local feature. Using a separate layer of K-means, a vocabulary of 3000 spatio-temporal words is learned with 500, 000 samples for training. In all our experiments the super blocks are cropped densely from the video with a 50% overlap. Finally, a χ2-kernel SVM on the histogram of spatio-temporal words is used for classification. |