Scene Graph Generation with Role-Playing Large Language Models

Authors: Guikun Chen, Jin Li, Wenguan Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on prevalent benchmarks show that SDSGG outperforms top-leading methods by a clear margin.
Researcher Affiliation Academia Guikun Chen1 , Jin Li3 , Wenguan Wang1,2 1Zhejiang University 2National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi an Jiaotong University 3Changsha University of Science & Technology
Pseudocode Yes Algorithm S1 Pseudo-code for MVA of SDSGG in a Py Torch-like style. Algorithm S2 Pseudo-code for the forward process of SDSGG in a Py Torch-like style.
Open Source Code Yes The code will be publicly available at https://github.com/guikunchen/SDSGG.
Open Datasets Yes We evaluate our method on GQA [15] and VG [14] following [3, 12].
Dataset Splits No VG is divided into two splits: base and novel split. The base split comprises 70% of the relation categories for training, while the novel split contains the remaining 30% categories invisible during training. No explicit separate "validation" dataset split is mentioned for hyperparameter tuning, only base and novel splits for categories.
Hardware Specification Yes One RTX 3090 is used for training.
Software Dependencies No The paper mentions "GPT-3.5 from Open AI" and "CLIP" (specifically Vi T-B/32 architecture), and "Py Torch" in the context of pseudo-code, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Our model is trained with a batch size of 4. The initial learning rate, momentum, and weight decay are set to be 2e-2, 9e-1, 1e-4, respectively. We utilize the pre-trained weights of CLIP to initialize our model.