DeiSAM: Segment Anything with Deictic Prompting
Authors: Hikaru Shindo, Manuel Brack, Gopika Sudhakaran, Devendra S Dhami, Patrick Schramowski, Kristian Kersting
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As part of our evaluation, we propose the Deictic Visual Genome (Dei VG) dataset, containing paired visual input and complex, deictic textual prompts. Our empirical results demonstrate that Dei SAM is a substantial improvement over purely data-driven baselines for deictic promptable segmentation. |
| Researcher Affiliation | Academia | Hikaru Shindo1 Manuel Brack1,2 Gopika Sudhakaran1,3 Devendra Singh Dhami4 Patrick Schramowski1,2,3,5 Kristian Kersting1,2,3 1Technical University of Darmstadt 2German Research Center for AI (DFKI) 3Hessian Center for AI (hessian.AI) 4Eindhoven University of Technology 5Center for European Research in Trusted Artificial Intelligence (CERTAIN) |
| Pseudocode | Yes | Listing 1: Rules generated by LLMs. cond1(X):-on(X,Y),type(Y,boat). cond2(X):-holding(X,Y),type(Y,umbrella). target(X):-cond1(X),cond2(X). |
| Open Source Code | Yes | Code: https://github.com/ml-research/deictic-segment-anything |
| Open Datasets | Yes | To facilitate a thorough evaluation of the novel deictic object segmentation tasks, we introduce the Deictic Visual Genome (Dei VG) dataset. Building on Visual Genome (Krishna et al., 2017), we construct pairs of deictic prompts and corresponding object annotations for real-world images... For each set, we randomly select 10k samples that we make publicly available to encourage further research. |
| Dataset Splits | Yes | We extracted instances from Dei VG datasets not used in VETO training (ca. 2000 samples), which we divided into a training, validation, and test split... which are divided into training, validation, and test splits that contain 1200, 400, and 400 instances, respectively. |
| Hardware Specification | No | The paper states in the NeurIPS checklist that it provides details on CPU, GPU, and RAM used, but these specific hardware models or types are not actually listed within the paper's text or appendices. |
| Software Dependencies | No | The paper mentions software like gpt-3.5-turbo, ada-002, SAM (Kirillov et al., 2023), Grounded Dino, GLIP, OFA, SEEM, and NEUMANN (Shindo et al., 2024), but it does not consistently provide specific version numbers for these software components or libraries, which is required for reproducible description. |
| Experiment Setup | Yes | The default Dei SAM configuration for the subsequent experiments uses... gpt-3.5-turbo as LLM for rule generation, ada-002 as embedding model for semantic unification, and SAM (Kirillov et al., 2023) for object segmentation... We set the box threshold to 0.3 and the text threshold to 0.25 for the SAM model. All generated rules are assigned a weight of 1.0... We used NEUMANN (Shindo et al., 2024) with γ = 0.01 for soft-logic operations, and the number of inference steps is set to 2... We used the RMSProp optimizer with a learning rate of 1e 2, and performed 200 steps of weight updates with a batch size of 1. The reasoners inference step was set to 4. We used Io U score s threshold θ = 0.8. |