Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Interaction-Centric Knowledge Infusion and Transfer for Open Vocabulary Scene Graph Generation
Authors: Lin Li, Chuhan ZHANG, Dong Zhang, Chong Sun, Chen Li, Long Chen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on three benchmarks show that ACC achieves state-of-the-art performance, demonstrating the potential of interaction-centric paradigms for real-world applications. |
| Researcher Affiliation | Collaboration | Lin Li1,2, Chuhan Zhang1,2, Dong Zhang1,2, Chong Sun3, Chen Li3, Long Chen1 1HKUST 2AI Chip Center for Emerging Smart Systems 3Tencent EMAIL, EMAIL |
| Pseudocode | Yes | Pseudo-code detailing this process is in appendix D for clarity. ... D Pseudo Code To make the interaction-guided query selection ( 3.2.1) process easier to understand, we provide pseudo-code for Step I and Step II in Algorithm S1 and Algorithm S2, respectively. |
| Open Source Code | Yes | https://github.com/HKUST-Long Group/ACC |
| Open Datasets | Yes | To evaluate ACC, we conducted comprehensive experiments on the benchmark Visual Genome (VG) [19], GQA [16], and PSG [56] datasets to validate its effectiveness in addressing the key challenges of OVSGG. |
| Dataset Splits | Yes | Following standard setup [55], 70% of the images are used for training, 5,000 for validation, and the remaining for testing. ... PSG [56] offers 44,967 training, 1,000 test, and 3,000 validation images (sampled from training), with 133 object and 56 predicate categories. |
| Hardware Specification | Yes | Our models are trained with a batch size of 3, utilizing four/eight RTX 3090 GPUs for computation. |
| Software Dependencies | Yes | During the supervision generation phase (c.f. 3.1), we employ Llama2-7B [48] to generate counter-actions based on the prompts described in C. ... keeping the visual backbone (Swin-T or Swin-B) and the text encoder (BERT-base [11]) frozen. |
| Experiment Setup | Yes | Our models are trained with a batch size of 3... For interaction-guided query selection (c.f. 3.2.1), we adopt the settings from [35, 9], where the total number of selected visual tokens K is set to 900, and the top-ranked interaction tokens L is fixed at 200. ... The weights β1 and β2 of the loss function LV RD and LRRD are set to 0.1 and 0.5, respectively, to balance different optimization objectives. |