What Is Where: Inferring Containment Relations from Videos

Authors: Wei Liang, Yibiao Zhao, Yixin Zhu, Song-Chun Zhu

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed method on our dataset with 1326 video clips taken in 9 indoor scenes, including some challenging cases, such as heavy occlusions and diverse changes of containment relations. The experimental results demonstrate good performance on the dataset.
Researcher Affiliation Academia 1School of Computer Science, Beijing Institute of Technology (BIT), China 2Center for Vision, Cognition, Learning, & Autonomy, University of California, Los Angeles, USA
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete statement about making its source code publicly available or a link to a code repository.
Open Datasets No The paper states: "We collected a RGB-D video dataset with diverse actions to evaluate the proposed method." It describes the dataset and its properties, but does not provide a specific link, DOI, repository name, or formal citation for public access to this collected dataset.
Dataset Splits No The paper mentions "800 clips are used to train our model and the remaining clips are for testing." It also refers to "cross-validation during the training phrase" for obtaining weights (λ1 and λ2). However, it does not provide specific numerical details (percentages or sample counts) for a distinct validation split.
Hardware Specification No The paper mentions that the dataset was "captured by a Kinect sensor" but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments or training the models.
Software Dependencies No The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup Yes Energy of containment relations is defined as: φ(Gt, Vt) = λ1 φIN + λ2 φON + φAFF, where λ1 and λ2 are the weights of the energy terms, obtained through cross-validation during the training phrase. ... The window sizes and sliding steps are both multi-scale.