Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders
Authors: Savya Khosla, Sethuraman T V, Barnett Lee, Alex Schwing, Derek Hoiem
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate REN on semantic segmentation and retrieval tasks, where it consistently outperforms the original encoders in both performance and compactness, and matches or exceeds SAMbased region methods while being significantly faster. Notably, REN achieves state-of-the-art results on the challenging Ego4D VQ2D benchmark and outperforms proprietary LMMs on Visual Haystacks single-needle challenge. |
| Researcher Affiliation | Academia | University of Illinois Urbana-Champaign EMAIL |
| Pseudocode | No | The paper describes the method and architecture using text and diagrams (Figure 1), but does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The code and pretrained models are available at https://github.com/savya08/ren. |
| Open Datasets | Yes | Table 9: Summary of datasets used in this work. All datasets used in this work are standard academic benchmarks and publicly available. Licensing information for each dataset is provided. Ego4D Training, Visual Query Localization MIT License ADE20K Semantic Segmentation BSD-3-Clause License VOC2012 Semantic Segmentation CC BY 2.5 COCO Visual Haystacks, Image Retrieval CC BY 4.0 |
| Dataset Splits | Yes | We evaluate REN on the Ego4D VQ2D benchmark, where the task is to localize the last occurrence of a query object in a long video. Following Shlapentokh-Rothman et al. [38], we use the COCO validation set as the image database, and 50 images with corresponding object masks serve as query instances for each object class. |
| Hardware Specification | Yes | This analysis is conducted on a single NVIDIA A40 using images from the ADE20K dataset [52]. We implement REN using Py Torch, with all training and evaluation performed on a single NVIDIA A40 GPU. |
| Software Dependencies | No | We implement REN using Py Torch, with all training and evaluation performed on a single NVIDIA A40 GPU. Token aggregation is then performed by constructing an adjacency graph using Sci Py [19]. For SLIC-based prompting, we use Fast-SLIC [21]. |
| Experiment Setup | Yes | Training is performed on images sampled from the Ego4D dataset [13]. We use a batch size of 16 and randomly sample up to 256 point prompts per image to generate region tokens. ... The model is optimized using Adam W with a learning rate of 0.001, cosine decay schedule, 100 warmup steps, and gradient clipping with a maximum norm of 5.0. |