Towards Flexible Visual Relationship Segmentation
Authors: Fangrui Zhu, Jianwei Yang, Huaizu Jiang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical validation across various datasets demonstrates that our framework outperforms existing models in standard, promptable, and open-vocabulary tasks, e.g., +1.9 m AP on HICO-DET, +11.4 Acc on VRD, +4.7 m AP on unseen HICO-DET. |
| Researcher Affiliation | Collaboration | Fangrui Zhu1 Jianwei Yang2 Huaizu Jiang1 1Northeastern University 2Microsoft Research |
| Pseudocode | No | The paper describes the model architecture and operations in text and diagrams, but does not include a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper includes a project page link (https://neu-vi.github.io/Fle VRS) but no explicit statement about open-sourcing the code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | For HOI segmentation, we utilize two public benchmarks: HICO-DET [4] and V-COCO [18]. ... For panoptic SGG, we use the PSG dataset [85], sourced from COCO and VG [37] intersections... |
| Dataset Splits | No | The paper provides training and testing splits for the datasets (e.g., '44,329 images (35,801 training, 8,528 testing)' for HICO-DET) but does not explicitly mention a separate validation split or its size. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like 'Adam W', 'CLIP', 'SAM', 'Focal-T/L', 'Da Vi T-B/L' but does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | During training, we set the input image to be 640 x 640, with batch size of 64. We optimize our network with Adam W [54] with a weight decay of 10^-4. We train all models for 30 epochs with an initial learning rate of 10^-4 decreased by 10 times at the 20th epoch. ... The loss weights λb, λd, λc and λgrd (superscript omitted) are set to 1,1,2, and 2. |