Iterative Scene Graph Generation

Authors: Siddhesh Khandelwal, Leonid Sigal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments on Visual Genome [30] and Action Genome [25] benchmark datasets we show improved performance on the scene graph generation task.
Researcher Affiliation Academia Siddhesh Khandelwal1,2 and Leonid Sigal1,2,3 1Department of Computer Science, University of British Columbia 2Vector Institute for AI 3CIFAR AI Chair {skhandel, lsigal}@cs.ubc.ca
Pseudocode No The paper describes the architecture and formulation but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The code is available at github.com/ubc-vision/Iterative SG.
Open Datasets Yes Through extensive experiments on Visual Genome [30] and Action Genome [25] benchmark datasets we show improved performance on the scene graph generation task. Visual Genome [30] is licensed under the Creative Commons Attribution 4.0 International License. Action Genome [25] is licensed under the MIT license.
Dataset Splits No We use widely popular data splits for our experiments. We briefly describe this in Section 5. The hyperparameters and additional data details are also mentioned in the supplementary (Section B).
Hardware Specification No Hardware resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute1. Additional support was provided by JELF CFI grant and Compute Canada under the RAC award.
Software Dependencies No We use Res Net-101 [22] as the backbone network for image feature extraction.
Experiment Setup Yes Implementation Details (transformer-based approach). We use Res Net-101 [22] as the backbone network for image feature extraction. Each of the subject, object, and predicate decoders have 6 layers, with a feature size of 256. The decoders use 300 queries. For training we use a batch size of 12 and initial learning rate of 10^4, which is gradually decayed.