Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scene Graph Generation with Role-Playing Large Language Models
Authors: Guikun Chen, Jin Li, Wenguan Wang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on prevalent benchmarks show that SDSGG outperforms top-leading methods by a clear margin. |
| Researcher Affiliation | Academia | Guikun Chen1 , Jin Li3 , Wenguan Wang1,2 1Zhejiang University 2National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi an Jiaotong University 3Changsha University of Science & Technology |
| Pseudocode | Yes | Algorithm S1 Pseudo-code for MVA of SDSGG in a Py Torch-like style. Algorithm S2 Pseudo-code for the forward process of SDSGG in a Py Torch-like style. |
| Open Source Code | Yes | The code will be publicly available at https://github.com/guikunchen/SDSGG. |
| Open Datasets | Yes | We evaluate our method on GQA [15] and VG [14] following [3, 12]. |
| Dataset Splits | No | VG is divided into two splits: base and novel split. The base split comprises 70% of the relation categories for training, while the novel split contains the remaining 30% categories invisible during training. No explicit separate "validation" dataset split is mentioned for hyperparameter tuning, only base and novel splits for categories. |
| Hardware Specification | Yes | One RTX 3090 is used for training. |
| Software Dependencies | No | The paper mentions "GPT-3.5 from Open AI" and "CLIP" (specifically Vi T-B/32 architecture), and "Py Torch" in the context of pseudo-code, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Our model is trained with a batch size of 4. The initial learning rate, momentum, and weight decay are set to be 2e-2, 9e-1, 1e-4, respectively. We utilize the pre-trained weights of CLIP to initialize our model. |