Is My Object in This Video? Reconstruction-based Object Search in Videos

Authors: Tan Yu, Jingjing Meng, Junsong Yuan

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on three benchmark datasets demonstrate the promising performance of the proposed ROSE method.
Researcher Affiliation Academia Tan Yu, Jingjing Meng, Junsong Yuan ROSE Lab, Interdisciplinary Graduate School, Nanyang Technological Univeristy, Singapore {tyu008, jingjing.meng, jsyuan}@ntu.edu.sg
Pseudocode No The paper describes algorithms and mathematical formulations but does not include any clearly labeled pseudocode blocks or algorithms.
Open Source Code No The paper does not provide any statement about releasing source code or include a link to a code repository for the methodology described.
Open Datasets Yes We conduct the systematic experiments on CNN-2h [Araujo et al., 2014], Egocentric1 [Chandrasekhar et al., 2014] and Egocentric2 [Chandrasekhar et al., 2014] datasets.
Dataset Splits No The paper describes the datasets used and how object proposals are generated but does not specify training, validation, and test splits (e.g., percentages or counts for each).
Hardware Specification No The paper does not mention any specific hardware used for running the experiments, such as CPU or GPU models, or cloud computing instances with their specifications.
Software Dependencies No The paper mentions using "Edge Boxes" and extracting features from "VGG-16 CNN model pre-trained on Imagenet dataset", as well as techniques like PCA, k-means clustering, and orthogonal matching pursuit. However, it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes In this paper, we adopt Edge Boxes [Zitnick and Doll ar, 2014] to generate 300 object proposals for each frame of the videos. For each object proposal, we further extract its feature by max-pooling the last convolutional layer of VGG-16 CNN model [Simonyan and Zisserman, 2014] pre-trained on Imagenet dataset. The max-pooled 512-dimensional features are further post-processed by principal component analysis (PCA) and whitening... we use k-means clustering to group 300 object proposals from every frame into 30 clusters and select the centroids of the clusters as compact object proposals. ...The encoder activation function f1( ) is implemented by rectifier defined as f1(x) = max(0, x). The decoder activation function is implemented by the linear function f2(x) = x. We set the expected average activations ˆρ of all the hidden neural nodes as 0.05. ...We set the default number of z as 2 on all three datasets. ...We conduct the comparison on the condition when the number of atoms/representatives l is set to be 200...