Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentiable Hierarchical Visual Tokenization
Authors: Marius Aasan, Martine Hjelkrem Tan, Nico Catalano, Changkyu Choi, AdĂn RamĂrez Rivera
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design our experiments to investigate the representation capabilities of our method’s extracted tokens in multiple settings; including end-to-end learning with classification on Image Net1k [31], transfer learning as a drop-in tokenizer replacement for pretrained Vi Ts (cf. Section 3.1), and demonstrating decoder-free segmentation models with learnable tokenization (cf. Section 3.2). Moreover, HT can also be evaluated on learnable image vectorization, and we compare our method to learnable image vectorization models (cf. Section 3.2). Training setup is detailed in Appendix D. |
| Researcher Affiliation | Academia | Marius Aasan1, Martine Hjelkrem-Tan1, Nico Catalano2, Changkyu Choi3, AdĂn RamĂrez Rivera1 1 University of Oslo Deptartment of Informatics 2Polytechnic University of Milan Artificial Intelligence and Robotics Lab 3Ui T The Arctic University of Norway Department of Physics and Technology |
| Pseudocode | Yes | Figure D.1: Core algorithms for HT. Left: Single iteration of hierarchical vertex merging with kernel-weighted aggregation. Right: Differentiable feature extraction with mean-injection and adaptive masking. Algorithm D.1 Single Merge Iteration Algorithm D.2 Feature Extraction |
| Open Source Code | Yes | Code and model weights: https://github.com/dsb-ifi/d HT |
| Open Datasets | Yes | We focus on transformer baselines trained exclusively on Image Net1k [31], and validate on various downstream tasks [32 36]. In addition to reporting top-1 accuracy scores, we perform a k NN evaluation to assess the quality of the representation space. Table 3: Single Scale Semantic Segmentation m Io U results on ADE20k [41] and COCO-Stuff164k [42]. |
| Dataset Splits | Yes | We focus on transformer baselines trained exclusively on Image Net1k [31], and validate on various downstream tasks [32 36]. Segmentation Fine Tuning: Given our fully trained HT models, we perform fine tuning for semantic segmentation. We replace each head with a single hidden-layer MLP with a hidden ratio of 4 . The fine tuning is performed using the configuration in Table D.1(d), and results are reported in Table 3. |
| Hardware Specification | Yes | Training and inference was performed on AMD MI250x and Nvidia A100. |
| Software Dependencies | No | The paper does not explicitly state specific version numbers for software dependencies such as Python, PyTorch, or other libraries. It only mentions general tools like Adam W optimizer without a version. |
| Experiment Setup | Yes | Table D.1: Configuration parameters for different stages (a) Pretraining config value: batch size 2048, epochs 400, img.size 192 192, pos.emb. 16 16, loss fn. CE (0.1 smooth.), optimizer LAMB, lr.sched. cos.decay (5 w.u.), lr (start / base / stop) 3e 3 / 3e 7 / 1e 6, momentum 0.9, dropout path 0.1 (S) / 0.2 (B), opt. Ε 1e 7, cutmix α 1.0, augment rand.aug. / aug3 (b) Tokenizer Retrofitting config value: batch size 2048, epochs 100, img.size 192 192, pos.emb. 16 16, loss fn. CE (0.1 smooth.), optimizer LAMB, lr.sched. cos.decay (5 w.u.), lr (start / base / stop) 1e 7 / 6e 5 / 1e 6, momentum 0.9, dropout path 0.1 (S) / 0.2 (B), opt. Ε 1e 8, augment rand.aug. / aug3, llrd 0.65 (c) Finetuning config value: batch size 512, epochs 100, img.size 224 224, pos.emb. 24 24, loss fn. CE (0.1 smooth.), optimizer Adam W, lr.sched. cos.decay (5 w.u.), lr (start / base / stop) 1e 6 / 1e 5 / 1e 5, dropout path 0.1 (S) / 0.2 (B), opt. Ε 1e 8, augment rand.aug. / aug3, llrd 0.9 (d) Segmentation Finetuning config value: batch size 512, epochs 400, img.size 512 512, pos.emb. 48 48, loss fn. BCE + Focal, optimizer Adam W, lr.sched. cos.decay (5 w.u.), lr (start / base / stop) 1e 6 / 1e 5 / 1e 5, dropout path 0.1 (S) / 0.2 (B), opt. Ε 1e 8, augment rand.aug. / aug3, crop scale / ratio (0.5, 1.0) / (0.8, 1.2), llrd 0.85 |