Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Differentiable Hierarchical Visual Tokenization

Authors: Marius Aasan, Martine Hjelkrem Tan, Nico Catalano, Changkyu Choi, Adín Ramírez Rivera

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design our experiments to investigate the representation capabilities of our method’s extracted tokens in multiple settings; including end-to-end learning with classification on Image Net1k [31], transfer learning as a drop-in tokenizer replacement for pretrained Vi Ts (cf. Section 3.1), and demonstrating decoder-free segmentation models with learnable tokenization (cf. Section 3.2). Moreover, HT can also be evaluated on learnable image vectorization, and we compare our method to learnable image vectorization models (cf. Section 3.2). Training setup is detailed in Appendix D.
Researcher Affiliation	Academia	Marius Aasan1, Martine Hjelkrem-Tan1, Nico Catalano2, Changkyu Choi3, Adín Ramírez Rivera1 1 University of Oslo Deptartment of Informatics 2Polytechnic University of Milan Artificial Intelligence and Robotics Lab 3Ui T The Arctic University of Norway Department of Physics and Technology
Pseudocode	Yes	Figure D.1: Core algorithms for HT. Left: Single iteration of hierarchical vertex merging with kernel-weighted aggregation. Right: Differentiable feature extraction with mean-injection and adaptive masking. Algorithm D.1 Single Merge Iteration Algorithm D.2 Feature Extraction
Open Source Code	Yes	Code and model weights: https://github.com/dsb-ifi/d HT
Open Datasets	Yes	We focus on transformer baselines trained exclusively on Image Net1k [31], and validate on various downstream tasks [32 36]. In addition to reporting top-1 accuracy scores, we perform a k NN evaluation to assess the quality of the representation space. Table 3: Single Scale Semantic Segmentation m Io U results on ADE20k [41] and COCO-Stuff164k [42].
Dataset Splits	Yes	We focus on transformer baselines trained exclusively on Image Net1k [31], and validate on various downstream tasks [32 36]. Segmentation Fine Tuning: Given our fully trained HT models, we perform fine tuning for semantic segmentation. We replace each head with a single hidden-layer MLP with a hidden ratio of 4 . The fine tuning is performed using the configuration in Table D.1(d), and results are reported in Table 3.
Hardware Specification	Yes	Training and inference was performed on AMD MI250x and Nvidia A100.
Software Dependencies	No	The paper does not explicitly state specific version numbers for software dependencies such as Python, PyTorch, or other libraries. It only mentions general tools like Adam W optimizer without a version.
Experiment Setup	Yes	Table D.1: Configuration parameters for different stages (a) Pretraining config value: batch size 2048, epochs 400, img.size 192 192, pos.emb. 16 16, loss fn. CE (0.1 smooth.), optimizer LAMB, lr.sched. cos.decay (5 w.u.), lr (start / base / stop) 3e 3 / 3e 7 / 1e 6, momentum 0.9, dropout path 0.1 (S) / 0.2 (B), opt. Ε 1e 7, cutmix α 1.0, augment rand.aug. / aug3 (b) Tokenizer Retrofitting config value: batch size 2048, epochs 100, img.size 192 192, pos.emb. 16 16, loss fn. CE (0.1 smooth.), optimizer LAMB, lr.sched. cos.decay (5 w.u.), lr (start / base / stop) 1e 7 / 6e 5 / 1e 6, momentum 0.9, dropout path 0.1 (S) / 0.2 (B), opt. Ε 1e 8, augment rand.aug. / aug3, llrd 0.65 (c) Finetuning config value: batch size 512, epochs 100, img.size 224 224, pos.emb. 24 24, loss fn. CE (0.1 smooth.), optimizer Adam W, lr.sched. cos.decay (5 w.u.), lr (start / base / stop) 1e 6 / 1e 5 / 1e 5, dropout path 0.1 (S) / 0.2 (B), opt. Ε 1e 8, augment rand.aug. / aug3, llrd 0.9 (d) Segmentation Finetuning config value: batch size 512, epochs 400, img.size 512 512, pos.emb. 48 48, loss fn. BCE + Focal, optimizer Adam W, lr.sched. cos.decay (5 w.u.), lr (start / base / stop) 1e 6 / 1e 5 / 1e 5, dropout path 0.1 (S) / 0.2 (B), opt. Ε 1e 8, augment rand.aug. / aug3, crop scale / ratio (0.5, 1.0) / (0.8, 1.2), llrd 0.85