Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploring Structural Degradation in Dense Representations for Self-supervised Learning

Authors: Siran Dai, Qianqian Xu, Peisong Wen, Yang Liu, Qingming Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on sixteen state-of-the-art methods across four benchmarks confirm that the SDD phenomenon consistently appears across diverse training approaches and evaluation protocols. More importantly, it persists even when training and evaluation are conducted on the same dataset. This demonstrates that SDD highlights the performance inconsistency between different tasks rather than overfitting to the data distribution, introducing a new challenge to the SSL community. ... Empirical evaluations demonstrate that the proposed DSE accurately predicts downstream performance, significantly outperforming existing metrics.
Researcher Affiliation	Academia	1 Institute of Information Engineering, Chinese Academy of Sciences 2 School of Cyber Security, University of Chinese Academy of Sciences 3 State Key Laboratory of AI Safety, Institute of Computing Technology, CAS 4 Peng Cheng Laboratory 5 School of Computer Science and Tech., University of Chinese Academy of Sciences
Pseudocode	Yes	Algorithm 1 DSE-based Model Selection
Open Source Code	Yes	Code is available at https://github.com/Eldercat SAM/SSL-Degradation.
Open Datasets	Yes	All models are trained for 800 epochs on Image Net-1k [39]... We evaluate four benchmarks: COCO-Stuff27 [8], PASCAL VOC [21], ADE20k [82], and Cityscapes [17]. ... We evaluate the semi-supervised video object segmentation on the DAVIS 2017 dataset [52] ... We conduct an additional experiments on depth estimation... on NYU-depth v2 dataset [47].
Dataset Splits	Yes	We following the default data splits for the datasets used. Other details are provided in Appendix D. ... To assess dense representation quality, we adopt the standard linear evaluation protocol standard in SSL [50, 86, 83, 84].
Hardware Specification	Yes	All pretraining are conducted on 8 NVIDIA A100 GPUs, the evaluation is done on 8 NVIDIA 4090 GPUs, and the DSE metric is computed on a single NVIDIA 4090 GPU.
Software Dependencies	No	The head is optimized using a batch size of 256 (64 for Cityscapes due to GPU memory limitation) and a learning rate of 0.01 * batch size/256 using an Adam [38] optimizer. ... Algorithm 2 An example Py Torch pseudocode of DINO with DSE-regularized training. (No specific versions for PyTorch, Adam, or other libraries are provided.)
Experiment Setup	Yes	The full pretraining hyperparameters are listed in Tab. 4. ... Input images are resized to 336 x 336 (except 896 x 896 for Cityscapes to preserve detail) and the classification head is trained on 100,000 images. The head is optimized using a batch size of 256 (64 for Cityscapes due to GPU memory limitation) and a learning rate of 0.01 * sqrt(batch size/256) using an Adam [38] optimizer.