reproducibilityindex.ai

Improving Scene Graph Classification by Exploiting Knowledge from Texts

Authors: Sahand Sharifzadeh, Sina Moayed Baharlou, Martin Schmitt, Hinrich Schütze, Volker Tresp2189-2197

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that by fine-tuning the classification pipeline with the extracted knowledge from texts, we can achieve 8x more accurate results in scene graph classification, 3x in object classification, and 1.5x in predicate classification, compared to the supervised baselines with only 1% of the annotated images. We evaluate our approach on the Visual Genome dataset.
Researcher Affiliation	Collaboration	Sahand Sharifzadeh1, Sina Moayed Baharlou1 , Martin Schmitt2, Hinrich Schütze2, Volker Tresp 1,3 1 Department of Informatics, LMU Munich, Germany 2 Center for Information and Language Processing (CIS), LMU Munich, Germany 3 Siemens AG, Munich, Germany
Pseudocode	Yes	Algorithm 1: Classify objects/predicates from images; Algorithm 2: Fine-tune the relational reasoning component from textual triples using a denoising auto-encoder paradigm
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We use the sanitized version [Xu et al. 2017] of Visual Genome (VG) dataset [Krishna et al. 2017] including images and their annotations, i.e., bounding boxes, scene graphs, and scene descriptions.
Dataset Splits	No	The paper specifies 'training images' (1% or 10% of VG data) and 'test sets' but does not explicitly define a separate validation set with specific percentages or counts.
Hardware Specification	No	The paper does not specify the hardware used to run the experiments, such as particular GPU or CPU models.
Software Dependencies	No	The paper mentions models and architectures like 'Res Net-50', 'Graph Transformer layers', and 'T5small model' but does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	To this end, we assume only a random proportion (1% or 10%) of training images are annotated (parallel set containing IM with corresponding SG and TXT). We consider the remaining data (99% or 90%) as our text set and discard their IM and SG. We use four different random splits [Sharifzadeh, Baharlou, and Tresp 2021] to avoid a sampling bias. We fine-tune the pre-trained T5 model on parallel TXT and SG. Randomly set 20% of the nodes and edges in E to zero.