Scene Graph Generation with Role-Playing Large Language Models
Authors: Guikun Chen, Jin Li, Wenguan Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on prevalent benchmarks show that SDSGG outperforms top-leading methods by a clear margin. |
| Researcher Affiliation | Academia | Guikun Chen1 , Jin Li3 , Wenguan Wang1,2 1Zhejiang University 2National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi an Jiaotong University 3Changsha University of Science & Technology |
| Pseudocode | Yes | Algorithm S1 Pseudo-code for MVA of SDSGG in a Py Torch-like style. Algorithm S2 Pseudo-code for the forward process of SDSGG in a Py Torch-like style. |
| Open Source Code | Yes | The code will be publicly available at https://github.com/guikunchen/SDSGG. |
| Open Datasets | Yes | We evaluate our method on GQA [15] and VG [14] following [3, 12]. |
| Dataset Splits | No | VG is divided into two splits: base and novel split. The base split comprises 70% of the relation categories for training, while the novel split contains the remaining 30% categories invisible during training. No explicit separate "validation" dataset split is mentioned for hyperparameter tuning, only base and novel splits for categories. |
| Hardware Specification | Yes | One RTX 3090 is used for training. |
| Software Dependencies | No | The paper mentions "GPT-3.5 from Open AI" and "CLIP" (specifically Vi T-B/32 architecture), and "Py Torch" in the context of pseudo-code, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Our model is trained with a batch size of 4. The initial learning rate, momentum, and weight decay are set to be 2e-2, 9e-1, 1e-4, respectively. We utilize the pre-trained weights of CLIP to initialize our model. |