Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2
Authors: Joel Valdivia Ortega, Lorenz Lamm, Franziska Eckardt, Benedikt Schworm, Marion Jasnin, Tingying Peng
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we introduce Randomized-MLP (RMLP) regularization, a contrastive learning-based method that encourages more semantically aligned representations. We use RMLPs when fine-tuning DINOv2 to both medical and natural image modalities, showing that it improves or maintains downstream performance while producing more interpretable attention maps. We also provide a mathematical analysis of RMLPs, offering insights into its role in enhancing Vi T-based models and advancing our understanding of contrastive learning.1 |
| Researcher Affiliation | Academia | 1Helmholtz AI, Helmoltz Munich, Neuherberg, Germany 2Helmholtz Pioneer Campus, Helmholtz Munich, Neuherberg, Germany 3School of Computation, Information and Technology, TUM, Garching, Germany 4Biozentrum, University of Basel, Basel, Switzerland 5Department of Ophthalmology, LMU University Hospital, LMU Munich, Munich, Germany 6Department of Chemistry, TUM, Garching, Germany. EMAIL EMAIL |
| Pseudocode | No | The paper describes the RMLP method using mathematical equations (Eq. 1 and Eq. 2) within Section 3, and discusses the theoretical analysis in Section 4. However, it does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code and pre-trained models are available at https://github.com/peng-lab/rmlp. |
| Open Datasets | Yes | A recollection of all the datasets used in this work can be found in Table 7a, both for natural and medical domains. To show the geographic diversity from our geographical dataset, we show the country of origin of the medical datasets used in this paper in Table 7b. Table 7: Used datasets, licenses and country of origin. (a) Used datasets and their licenses. Name License Repository Image Net-1k [34] CC0: Public Domain Image Net-1k NYU-Depth V2 [36] MIT NYU-Depth V2 ADE20k [45] BSD-3-Clause ADE20k BSDS300 [27] Non-commercial research BDSD300 VOC07 [13] Custom PASCAL VOC 2007 OCTID [14] CC0 1.0 OCTID Glaucoma Fundus [2] CC0 1.0 Glaucoma Fundus IDRID [31] Open Access IDRID JSIEC [7] Open Access JSIEC MESSIDOR-2 [1, 10] Non-commercial research MESSIDOR-2 PAPILA [22] GPL 3.0+ PAPILA Retina [6] Open Access Retina Aptos [3] Custom Aptos Eckardt, et al. [12] Property of LMU University Hospital - |
| Dataset Splits | Yes | For datasets without predefined splits, we manually partitioned them into 70% training, 15% validation, and 15% testing. |
| Hardware Specification | Yes | All training was performed on a single GPU (Quadro RTX153 8000 or NVIDIA A100-SXM4-40GB) and required approximately 15 hours per trained backbone, reflecting the low computational cost of our approach and its reduced environmental footprint. |
| Software Dependencies | No | All models were trained using the Adam W optimizer [24], which we employed consistently across fine-tuning stages as well as during training of downstream linear heads and UNet decoders. Table 5 shows the main hyperparameters used when fine-tuning DINOv2-S on natural, OCT and CFP modalities. Further implementation details can be found in our code. |
| Experiment Setup | Yes | Table 5: Hyperparameters used for fine-tuning DINOv2-S to obtain C-Vi T and RĪ»-Vi T. Hyperparameter Value Optimizer Adam W[24] Plateau size for early stop 10 epochs Batch size 32 Token s dimension 384 DINO coefficient 1 i BOT coefficient 1 Ko Leo coefficient 0.5 Initial learning rate 1e-7 Patience/factor for learning rate scheduler 3/0.4 Minimum learning rate 1e-8 Hidden/Bottleneck/Output dimensions for MLPs and RMLPs 1536/256/65536 Number of transformer blocks 12 Patch size 14 Crop size 224 Steps per epoch 100 Warm up epochs 10 |