Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval

Authors: Dvir Samuel, Rami Ben-Ari, Matan Levy, Nir Darshan, Gal Chechik

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental PDM demonstrates superior performance on popular retrieval and segmentation benchmarks, outperforming even supervised methods. We also highlight notable shortcomings in current instance and segmentation datasets and propose new benchmarks for these tasks.
Researcher Affiliation Collaboration 1Bar-Ilan University, Israel 2Origin AI, Israel 3The Hebrew University of Jerusalem, Israel 4NVIDIA Research, Israel
Pseudocode No The paper describes its methods in text and with diagrams (e.g., Figure 3), but it does not include structured pseudocode or algorithm blocks.
Open Source Code No Not currently. We use public datasets, so the data used is available. We are working on a formal approval to publicly release the code, upon acceptance.
Open Datasets Yes For the evaluation of PDM, we adopted traditional instance retrieval and one-shot segmentation benchmarks... We first evaluate PDM on the Per Seg [45] dataset... We further conducted experiments across two temporal one-shot image segmentation benchmarks. We conducted evaluations on the DAVIS17 dataset [27]... Initially, we assess our model s performance on the widely-used ROxford and RParis datasets [25, 26] with revised annotations [28]... Our proposed benchmarks are constructed using the recently introduced BURST dataset [4]...
Dataset Splits Yes For the personalized instance retrieval (Per MIR), we randomly chose three frames from each video, designating one as the query frame and the remaining two as the database (gallery) frames. For this, the Per MIR and ROxford-Hard datasets were split into 20% training and 80% test sets, and the weighted fusion parameters were optimized on the training sets.
Hardware Specification Yes Using So TA inversion technique by [24] with Vanilla Stable Diffusion, takes about 5 seconds for each image on a single A100.
Software Dependencies No The paper mentions software components like 'Vanilla Stable Diffusion', 'SDXL-turbo', and 'SAM [14]' but does not provide specific version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Therefore, for all our experiments, features were extracted from SDXL-turbo at the last U-Net layer at the first timestep t = 4. Furthermore, all images were resized to 512 x 512 for proper image inversion. We set τ, the threshold for Mr to be 0.7 for all our experiments.