Hierarchy-Agnostic Unsupervised Segmentation: Parsing Semantic Image Structure
Authors: Simone Rossetti, Fiora Pirri
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a new metric for estimating the quality of the semantic segmentation of discovered elements on different levels of the hierarchy. The metric validates the intrinsic nature of the compositional relations among parts, objects, and scenes in a hierarchy-agnostic domain. Our results prove the power of this methodology, uncovering semantic regions without prior definitions and scaling effectively across various datasets. This robust framework for unsupervised image segmentation proves more accurate semantic hierarchical relationships between scene elements than traditional algorithms. The experiments underscore its potential for broad applicability in image analysis tasks, showcasing its ability to deliver a detailed and unbiased segmentation that surpasses existing unsupervised methods. |
| Researcher Affiliation | Collaboration | Simone Rossetti1,2 Fiora Pirri1,2 1DIAG, Sapienza University of Rome 2Deep Plants {rossetti,pirri}@diag.uniroma1.it {simone,fiora}@deepplants.com |
| Pseudocode | Yes | In Appendix B, we discuss the algorithm s properties and the generated T, and present the complete pseudocode of our method. |
| Open Source Code | Yes | We provided code for reproducing experiments in Table 1 in the supplementary material. We will release the full code upon acceptance. |
| Open Datasets | Yes | We benchmark our algorithm on unsupervised multi-granular segmentation using seven major object- and scene-centric datasets and seven hierarchically structured datasets with varying granularity levels for hierarchy-agnostic segmentation. We only utilize publicly available datasets, SSL model checkpoints without retraining, and validation set ground-truth annotations. |
| Dataset Splits | Yes | We only utilize publicly available datasets, SSL model checkpoints without retraining, and validation set ground-truth annotations. |
| Hardware Specification | Yes | We ran experiments on an ASUS ESC8000 server with two AMD EPYC 7413 24-core processors and 256GB RAM. We used the Py Torch 2.3 deep learning framework and 2 NVIDIA A6000 GPUs with 48GB of VRAM to accelerate the feature extraction stage. |
| Software Dependencies | Yes | We used the Py Torch 2.3 deep learning framework and 2 NVIDIA A6000 GPUs with 48GB of VRAM to accelerate the feature extraction stage. |
| Experiment Setup | Yes | Unless otherwise specified, we use the DINOv2-Vi T-B14-REG [22] backbone with parameters kmin = 1, pmax = 20, and λmax = 0.8. We apply the spectral method from Ng et al. [61] with m = 300 for superpixel clustering. The recursive partitioning depth is limited at 10 levels. Depending on each backbone downsampling factor, input images are resized to extract 60 60 codes, except for urban street scenes, where we obtain 60 120 codes. |