Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Characterizing Vision Backbones for Dense Prediction with Dense Attentive Probing

Authors: Timo Lüddecke, Alexander S. Ecker

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose dense attentive probing, a parameter-efficient readout method for dense prediction on arbitrary backbones independent of the size and resolution of their feature volume. To this end, we extend cross-attention with distance-based masks of learnable sizes. We employ this method to evaluate 18 common backbones on dense predictions tasks in three dimensions: instance awareness, local semantics and spatial understanding. We find that DINOv2 outperforms all other backbones tested including those supervised with masks and language across all three task categories.
Researcher Affiliation	Academia	The provided paper text lists the authors 'Timo Lüddecke and Alexander Ecker' but does not explicitly state their institutional affiliations, departments, cities, countries, or email addresses. Therefore, it is not possible to classify the affiliation type based solely on the provided text. As a default, assuming academic affiliation common for research papers.
Pseudocode	No	The paper describes the Dense Attentive Probing (DeAP) method in Section 3 with equations and a diagram (Figure 2), but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code is available at https://eckerlab.org/code/deap.
Open Datasets	Yes	We use the COCO dataset (Lin et al., 2014), with the 5,000 images from the validation set being used for testing. For the instance discrimination task, we compute the ARI (adjusted rand index) test scores only on images with at least three large objects (resulting in a subset of 754 images). A natural choice for evaluating local semantics is a semantic segmentation task. Here we rely on two benchmarks: Pascal VOC 2012 (Everingham et al., 2015) and COCO Stuff (Caesar et al., 2018). ... We frame this as a depth map estimation problem, i. e. Dout = 1, relying on the NYUv2 dataset (Nathan Silberman & Fergus, 2012) for training and testing the depth estimation readout.
Dataset Splits	Yes	For these experiments, we use the COCO dataset (Lin et al., 2014), with the 5,000 images from the validation set being used for testing. For the instance discrimination task, we compute the ARI (adjusted rand index) test scores only on images with at least three large objects (resulting in a subset of 754 images). On COCO and Pascal we use the validation sets for testing, while model selection is carried out on a separate part of the training set via validation loss.
Hardware Specification	Yes	For example, our standard training for a readout on a Vi T-B/16 224 pixel backbone on Pascal VOC adds less than 70,000 parameters and trains in less than 16 minutes (using a single Nvidia RXT2080 GPU).
Software Dependencies	No	The paper mentions using 'Py Torch vision (Paszke et al., 2019)' and the 'timm package (Wightman, 2019)' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We use the Adam optimizer with a learning rate of 0.001, except for boundary prediction and depth where it is set to 0.002. We use 8 attention heads in all models. On COCO and Pascal we use the validation sets for testing, while model selection is carried out on a separate part of the training set via validation loss. name BS LR WD iterations val intv. img size heads dim base size Pascal VOC2012 32 0.001 0.010 6000 250 224 8 16 28 COCO Stuff 32 0.001 0.010 20000 250 224 8 16 28 NYUv2 Depth 32 0.001 0.010 3000 100 [216, 288] 8 16 28 Instance Discrimination 32 0.001 0.010 20000 -1 224 8 16 28 Boundaries 32 0.001 0.010 10000 250 448 8 16 56 Center Net 32 0.001 0.010 20000 1000 448 8 16 56 BS,LR and WD correspond to batch size, learning rate and weight decay respectively.