Adaptive Visual Scene Understanding: Incremental Scene Graph Generation

Authors: Naitik Khandelwal, Xiao Liu, Mengmi Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results not only highlight the challenges of directly combining existing continual learning methods with SGG backbones but also demonstrate the effectiveness of our proposed approach, enhancing CSEGG efficiency while simultaneously preserving privacy and memory usage. All data and source code are publicly available here.
Researcher Affiliation Academia 1 College of Computing and Data Science, Nanyang Technological University (NTU), Singapore 2 Deep Neuro Cognition Lab, Agency for Science, Technology and Research (A*STAR), Singapore
Pseudocode No The paper describes methods and processes but does not include any clearly labeled pseudocode blocks or algorithms.
Open Source Code Yes All data and source code are publicly available here.
Open Datasets Yes Thus, we re-structure the Visual Genome dataset [25] and establish a novel and comprehensive CSEGG benchmark, where AI models are deployed to dynamic scenes where new objects and new relationships are introduced.
Dataset Splits Yes In CSEGG, to cater to the three continual learning scenarios below, we re-organize the Visual Genome [25] dataset and follow its standard image splits for training, validation, and test sets specified in [72].
Hardware Specification Yes All models are trained on 4 A5000 GPUs.
Software Dependencies No The paper mentions software components like the Stable Diffusion model [58] and Adam optimizer, and uses implementations from [32] and [68], but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes For SGTR in Fig. S3 (a), the approach uniquely formulates the task as a bipartite graph construction problem. ... a batch size of 32 is used. All methods are optimized using the Adam optimizer with a base learning rate of 1 10 4 and a weight decay of 1 10 4. Object detection training is conducted only in the S2 and S3 scenarios. Each task in S2 is trained for 100 epochs, while each task in S3 is trained for 50 epochs. ... SGG (Scene Graph Generation) Training: In this stage, the entire SGTR model is fine-tuned while keeping the 2D-CNN feature extractor frozen. A batch size of 24 is employed, and the Adam optimizer is used with a base learning rate of 8 10 5. In S1 and S3, each model is trained for 50 epochs per task, while in S2, 80 epochs per task are used.