Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Morphing Tokens Draw Strong Masked Image Models
Authors: Taekyung Kim, Byeongho Heo, Dongyoon Han
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Image Net-1K and ADE20K demonstrate DTM s superiority, which surpasses complex state-of-the-art MIM methods. Furthermore, the evaluation of transfer learning on downstream tasks like i Naturalist, along with extensive empirical studies, supports DTM s effectiveness. |
| Researcher Affiliation | Industry | Taekyung Kim , Byeongho Heo, Dongyoon Han NAVER AI Lab EMAIL |
| Pseudocode | Yes | Algorithm 1: Token Morphing Function (ϕR) 1: input: token representation {vi}N i=1, iteration k, scheduler R = {rp}k p=1 2: define n N 3: define v0 i vi for i [1, N] 4: for p {1, . . . , k} do # k-iterative morphing 5: M p BIPARTITEMATCHING(vp, n) 6: M p ij M p ij/ Pn j =1 M p ij for all i, j # Normalize 7: vp+1 i Pn j=1 M p ijvp j for i [1, n rp] # Morph matched tokens 8: n n rp 9: return M = Πk p=1 M p 10: function BIPARTITEMATCHING(vp, n) # Standard bipartite matching algorithm 11: (Sp 1 , Sp 2 ) random split([1, 2, . . . , n]) # Split for Bipartite matching 12: sim [Sim(vp i , vp j ) for (i, j) Sp 1 Sp 2 ] # Measure similarity 13: σ sort(sim, order= descending )[rp] # Threshold for top-rp similarity 14: M p ij 1; M p M p\M p j s.t. Sim(vp i , vp j ) σ, (i, j) in Sp 1 Sp 2 15: return M p 16: end function |
| Open Source Code | Yes | Code is available at https://github.com/naver-ai/dtm. |
| Open Datasets | Yes | Experiments on Image Net-1K and ADE20K demonstrate DTM s superiority... Experiments on Image Net-1K and ADE20K demonstrate DTM s superiority... The effectiveness of our method is supported by accelerated fine-tuning trends after DTM pre-training, which highlights how spatially consistent targets are crucial. Our method shows further transferability on the i Naturalist (Van Horn et al., 2018) and fine-grained visual classification datasets (Van Horn et al., 2015; Krizhevsky, 2009; Khosla et al., 2011). |
| Dataset Splits | Yes | Fine-tuning on Image Net-1K. We fine-tune our pre-trained models on Image Net-1K (Russakovsky et al., 2015) by default following the standard protocol (He et al., 2022; Peng et al., 2022). Fine-tuning on ADE20K. Table K summarizes the fine-tuning recipe of Vi T/16 for the semantic segmentation task on ADE20K (Zhou et al., 2017). Transfer learning. We follow the fine-tuning recipes for DTM to conduct transfer learning to i Naturalist datasets... and FGVC datasets... |
| Hardware Specification | Yes | The model is fine-tuned using 8 V100-32GB GPUs. |
| Software Dependencies | No | We train our framework with Vi T-S/16, Vi T-B/16, and Vi T-L/16 for 300 epochs using Adam W with momentum (0.9, 0.98) and a batch size of 1024. ... We adopt commonly used values for Rand Augment, Mixup, Cutmix, and Label Smoothing. |
| Experiment Setup | Yes | Table I reports the implementation details for pre-training. We train our framework with Vi T-S/16, Vi T-B/16, and Vi T-L/16 for 300 epochs using Adam W with momentum (0.9, 0.98) and a batch size of 1024. We use a learning rate of 1.5 10 4 with cosine decay and warmup 10 epochs. ... Fine-tuning on Image Net-1K. We fine-tune our pre-trained models on Image Net-1K (Russakovsky et al., 2015) by default following the standard protocol (He et al., 2022; Peng et al., 2022). Specifically, pre-trained Vi T-S/-B/-L are fine-tuned for 300, 100, and 50 epochs, respectively. Optimization is performed with Adam W using a weight decay of 0.05. We use a layer-wise learning rate decay of 0.6 for Vi T-S and Vi T-B and 0.8 for Vi T-L. Learning rate is set to 5 10 4 with a linear warmup for 10 epochs for Vi T-S and Vi T-B and 5 epochs for Vi T-L. We adopt commonly used values for Rand Augment, Mixup, Cutmix, and Label Smoothing. |