Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

Authors: Changdae Oh, Yixuan Li, Kyungwoo Song, Sangdoo Yun, Dongyoon Han

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate Da Win on the large-scale visual recognition benchmarks, spanning 14 tasks across robust fine-tuning Image Net and derived five distribution shift benchmarks and multi-task learning with eight classification tasks. Results demonstrate that Da Win achieves significant performance gain in considered settings, with minimal computational overhead.
Researcher Affiliation	Collaboration	1University of Wisconsin Madison 2Yonsei University 3NAVER AI Lab {changdae,EMAIL} EMAIL {sangdoo.yun,EMAIL}
Pseudocode	Yes	Algorithm 1: Procedure for training-free dynamic weight interpolation (Da Win)
Open Source Code	Yes	Here is our code.
Open Datasets	Yes	We use Image Net-1K (Russakovsky et al., 2015) and its five OOD variants, Image Net-V2 (Recht et al., 2019), Image Net-R (Hendrycks et al., 2021a), Image Net-A (Hendrycks et al., 2021b), Image Net-Sketch (Wang et al., 2019), and Object Net (Barbu et al., 2019) for evaluating robustness under distribution shifts. For multi-task learning, we follow the standard evaluation protocol (Ilharco et al., 2022; Yang et al., 2024b) using eight benchmark datasets: SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Yuval, 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), and DTD (Cimpoi et al., 2014).
Dataset Splits	Yes	For multi-task learning, we follow the standard evaluation protocol (Ilharco et al., 2022; Yang et al., 2024b) using eight benchmark datasets... For fine-tuning CLIPs on Image Net, Wortsman et al. (2022a) and its successor (Jang et al., 2024) conducted multiple training with different training configurations such as data augmentation, learning rate, weight decay, and random initialization seeds given fixed epochs (16) and batch size (512).
Hardware Specification	Yes	Here, we use Vi T-B/32 backbone model on NVIDIA A100 GPU(s).
Software Dependencies	No	The paper mentions using CLIP and the NSML platform, but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries.
Experiment Setup	Yes	For fine-tuning CLIPs on Image Net, Wortsman et al. (2022a) and its successor (Jang et al., 2024) conducted multiple training with different training configurations such as data augmentation, learning rate, weight decay, and random initialization seeds given fixed epochs (16) and batch size (512)... For Da Win, we set K to 3, 5, 2 for Vi T-{B/32, B/16, L/14} in the robust fine-tuning and K = 1 in the multi-task setups... temperature scaling (Guo et al., 2017) is applied in the robust fine-tuning setup with ID validation set... scaling term (set to 0.3 following Ilharco et al. (2023)).