Towards Flexible Visual Relationship Segmentation

Authors: Fangrui Zhu, Jianwei Yang, Huaizu Jiang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical validation across various datasets demonstrates that our framework outperforms existing models in standard, promptable, and open-vocabulary tasks, e.g., +1.9 m AP on HICO-DET, +11.4 Acc on VRD, +4.7 m AP on unseen HICO-DET.
Researcher Affiliation Collaboration Fangrui Zhu1 Jianwei Yang2 Huaizu Jiang1 1Northeastern University 2Microsoft Research
Pseudocode No The paper describes the model architecture and operations in text and diagrams, but does not include a formal pseudocode or algorithm block.
Open Source Code No The paper includes a project page link (https://neu-vi.github.io/Fle VRS) but no explicit statement about open-sourcing the code for the described methodology or a direct link to a code repository.
Open Datasets Yes For HOI segmentation, we utilize two public benchmarks: HICO-DET [4] and V-COCO [18]. ... For panoptic SGG, we use the PSG dataset [85], sourced from COCO and VG [37] intersections...
Dataset Splits No The paper provides training and testing splits for the datasets (e.g., '44,329 images (35,801 training, 8,528 testing)' for HICO-DET) but does not explicitly mention a separate validation split or its size.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software components like 'Adam W', 'CLIP', 'SAM', 'Focal-T/L', 'Da Vi T-B/L' but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes During training, we set the input image to be 640 x 640, with batch size of 64. We optimize our network with Adam W [54] with a weight decay of 10^-4. We train all models for 30 epochs with an initial learning rate of 10^-4 decreased by 10 times at the 20th epoch. ... The loss weights λb, λd, λc and λgrd (superscript omitted) are set to 1,1,2, and 2.