Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval
Authors: Dvir Samuel, Rami Ben-Ari, Matan Levy, Nir Darshan, Gal Chechik
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | PDM demonstrates superior performance on popular retrieval and segmentation benchmarks, outperforming even supervised methods. We also highlight notable shortcomings in current instance and segmentation datasets and propose new benchmarks for these tasks. |
| Researcher Affiliation | Collaboration | 1Bar-Ilan University, Israel 2Origin AI, Israel 3The Hebrew University of Jerusalem, Israel 4NVIDIA Research, Israel |
| Pseudocode | No | The paper describes its methods in text and with diagrams (e.g., Figure 3), but it does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | Not currently. We use public datasets, so the data used is available. We are working on a formal approval to publicly release the code, upon acceptance. |
| Open Datasets | Yes | For the evaluation of PDM, we adopted traditional instance retrieval and one-shot segmentation benchmarks... We first evaluate PDM on the Per Seg [45] dataset... We further conducted experiments across two temporal one-shot image segmentation benchmarks. We conducted evaluations on the DAVIS17 dataset [27]... Initially, we assess our model s performance on the widely-used ROxford and RParis datasets [25, 26] with revised annotations [28]... Our proposed benchmarks are constructed using the recently introduced BURST dataset [4]... |
| Dataset Splits | Yes | For the personalized instance retrieval (Per MIR), we randomly chose three frames from each video, designating one as the query frame and the remaining two as the database (gallery) frames. For this, the Per MIR and ROxford-Hard datasets were split into 20% training and 80% test sets, and the weighted fusion parameters were optimized on the training sets. |
| Hardware Specification | Yes | Using So TA inversion technique by [24] with Vanilla Stable Diffusion, takes about 5 seconds for each image on a single A100. |
| Software Dependencies | No | The paper mentions software components like 'Vanilla Stable Diffusion', 'SDXL-turbo', and 'SAM [14]' but does not provide specific version numbers for general software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Therefore, for all our experiments, features were extracted from SDXL-turbo at the last U-Net layer at the first timestep t = 4. Furthermore, all images were resized to 512 x 512 for proper image inversion. We set τ, the threshold for Mr to be 0.7 for all our experiments. |