reproducibilityindex.ai

Adaptive Image-to-Video Scene Graph Generation via Knowledge Reasoning and Adversarial Learning

Authors: Jin Chen, Xiaofeng Ji, Xinxiao Wu276-284

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiment results on two benchmark video datasets demonstrate the effectiveness of our method. Extensive experiments on the benchmark dataset have validated the effectiveness of our method.
Researcher Affiliation	Academia	Jin Chen, Xiaofeng Ji, Xinxiao Wu* Beijing Laboratory of Intelligent Information Technology School of Computer Science, Beijing Institute of Technology, Beijing, China {chen jin, jixf, wuxinxiao}@bit.edu.cn
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not explicitly state that the source code for the described methodology is publicly available or provide a link to it.
Open Datasets	Yes	To evaluate the proposed method, we conduct experiments on two video benchmark datasets, i.e., the Vid VRD dataset (Shang et al. 2017) and the Vid OR dataset (Shang et al. 2019). With the Vid VRD dataset as the target domain, we use the VRD dataset (Lu et al. 2016) as the source image domain. With the Vid OR dataset as the target video domain, we use the VG dataset (Zhang et al. 2017) as the source image domain.
Dataset Splits	No	The paper mentions training data and that target video annotations are only used for evaluation, but it does not provide specific numerical percentages or counts for training, validation, and test splits, nor does it refer to specific predefined splits by name.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using Faster R-CNN and ResNet101, but it does not specify version numbers for these or any other key software libraries or dependencies used in the experiments.
Experiment Setup	Yes	The shorter side of images and video frames is resized into 600 while preserving its aspect ratio. The dimension of the second-order statistic descriptor is set to 512 and the hyperparameter r in the factorized bilinear pooling is set to 5. The domain classiﬁer Dimg and the instance domain classiﬁer Dins are designed using ﬁve fully-connected layers (1024 512 256 128 1) and three convolution layers (512 128 1), respectively. The visual mapping φ and the language mapping ϕ consist of three fullyconnected layers (256 256 300) and two fully-connect layers (1024 300), respectively. During test, we use non maximum suppression with an Io U threshold of 0.3 to select boxes from object proposals and then take the selected boxes with a conﬁdence score greater than 0.5 as the ﬁnal detected objects to predicate relationships.