Is My Object in This Video? Reconstruction-based Object Search in Videos
Authors: Tan Yu, Jingjing Meng, Junsong Yuan
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on three benchmark datasets demonstrate the promising performance of the proposed ROSE method. |
| Researcher Affiliation | Academia | Tan Yu, Jingjing Meng, Junsong Yuan ROSE Lab, Interdisciplinary Graduate School, Nanyang Technological Univeristy, Singapore {tyu008, jingjing.meng, jsyuan}@ntu.edu.sg |
| Pseudocode | No | The paper describes algorithms and mathematical formulations but does not include any clearly labeled pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or include a link to a code repository for the methodology described. |
| Open Datasets | Yes | We conduct the systematic experiments on CNN-2h [Araujo et al., 2014], Egocentric1 [Chandrasekhar et al., 2014] and Egocentric2 [Chandrasekhar et al., 2014] datasets. |
| Dataset Splits | No | The paper describes the datasets used and how object proposals are generated but does not specify training, validation, and test splits (e.g., percentages or counts for each). |
| Hardware Specification | No | The paper does not mention any specific hardware used for running the experiments, such as CPU or GPU models, or cloud computing instances with their specifications. |
| Software Dependencies | No | The paper mentions using "Edge Boxes" and extracting features from "VGG-16 CNN model pre-trained on Imagenet dataset", as well as techniques like PCA, k-means clustering, and orthogonal matching pursuit. However, it does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | In this paper, we adopt Edge Boxes [Zitnick and Doll ar, 2014] to generate 300 object proposals for each frame of the videos. For each object proposal, we further extract its feature by max-pooling the last convolutional layer of VGG-16 CNN model [Simonyan and Zisserman, 2014] pre-trained on Imagenet dataset. The max-pooled 512-dimensional features are further post-processed by principal component analysis (PCA) and whitening... we use k-means clustering to group 300 object proposals from every frame into 30 clusters and select the centroids of the clusters as compact object proposals. ...The encoder activation function f1( ) is implemented by rectifier defined as f1(x) = max(0, x). The decoder activation function is implemented by the linear function f2(x) = x. We set the expected average activations ˆρ of all the hidden neural nodes as 0.05. ...We set the default number of z as 2 on all three datasets. ...We conduct the comparison on the condition when the number of atoms/representatives l is set to be 200... |