Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation

Authors: Tao Liu, Rongjie Li, Chongyu Wang, Xuming He

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the Visual Genome and Open Images v6 datasets demonstrate that our framework consistently achieves state-of-the-art performance, demonstrating its effectiveness in addressing the challenges of open-vocabulary scene graph generation.
Researcher Affiliation	Academia	1Shanghai Tech University, Shanghai, China 2Shanghai Engineering Research Center of Intelligent Vision and Imaging EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods using natural language and figures, but no explicit pseudocode or algorithm blocks are present.
Open Source Code	No	The paper does not contain any explicit statements about releasing code or links to a code repository.
Open Datasets	Yes	To evaluate the SGG task, we adopt two benchmarks: the VG150 version of the Visual Genome (VG) dataset (Krishna et al. 2017) and the Open Image v6 (OIV6) dataset (Kuznetsova et al. 2020).
Dataset Splits	Yes	In the VG dataset s Pred CLS setting, we follow Epic s predicate split, selecting 70% of the categories as base predicates and the remaining 30% as novel predicates. In the SGDet setting, we follow the Ov SGTR predicate split. For the OIV6 dataset, we use the predicate split from PGSG.
Hardware Specification	Yes	All experiments are implemented in Py Torch and trained on 4 NVIDIA A40 GPUs.
Software Dependencies	Yes	We employ the GPT-3.5-turbo, as our LLM. We adopt CLIP (Radford et al. 2021) (Vi TB/32) as our VLM backbone. ... All experiments are implemented in Py Torch
Experiment Setup	Yes	We set k = 3 to dynamically select and set α = 0.25 to balance the weights of the two text prompts. For training losses, the weight of the entity detector is λ1 = 2, the weight for predicate prediction is λ2 = 1, and the weight for distillation loss is λ3 = 20.