Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Interaction-Centric Knowledge Infusion and Transfer for Open Vocabulary Scene Graph Generation

Authors: Lin Li, Chuhan ZHANG, Dong Zhang, Chong Sun, Chen Li, Long Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on three benchmarks show that ACC achieves state-of-the-art performance, demonstrating the potential of interaction-centric paradigms for real-world applications.
Researcher Affiliation Collaboration Lin Li1,2, Chuhan Zhang1,2, Dong Zhang1,2, Chong Sun3, Chen Li3, Long Chen1 1HKUST 2AI Chip Center for Emerging Smart Systems 3Tencent EMAIL, EMAIL
Pseudocode Yes Pseudo-code detailing this process is in appendix D for clarity. ... D Pseudo Code To make the interaction-guided query selection ( 3.2.1) process easier to understand, we provide pseudo-code for Step I and Step II in Algorithm S1 and Algorithm S2, respectively.
Open Source Code Yes https://github.com/HKUST-Long Group/ACC
Open Datasets Yes To evaluate ACC, we conducted comprehensive experiments on the benchmark Visual Genome (VG) [19], GQA [16], and PSG [56] datasets to validate its effectiveness in addressing the key challenges of OVSGG.
Dataset Splits Yes Following standard setup [55], 70% of the images are used for training, 5,000 for validation, and the remaining for testing. ... PSG [56] offers 44,967 training, 1,000 test, and 3,000 validation images (sampled from training), with 133 object and 56 predicate categories.
Hardware Specification Yes Our models are trained with a batch size of 3, utilizing four/eight RTX 3090 GPUs for computation.
Software Dependencies Yes During the supervision generation phase (c.f. 3.1), we employ Llama2-7B [48] to generate counter-actions based on the prompts described in C. ... keeping the visual backbone (Swin-T or Swin-B) and the text encoder (BERT-base [11]) frozen.
Experiment Setup Yes Our models are trained with a batch size of 3... For interaction-guided query selection (c.f. 3.2.1), we adopt the settings from [35, 9], where the total number of selected visual tokens K is set to 900, and the top-ranked interaction tokens L is fixed at 200. ... The weights β1 and β2 of the loss function LV RD and LRRD are set to 0.1 and 0.5, respectively, to balance different optimization objectives.