Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Single-pass Adaptive Image Tokenization for Minimum Program Search

Authors: Shivam Duggal, Sanghyun Byun, Bill Freeman, Antonio Torralba, Phillip Isola

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare KARL both quantitatively (Tab. 1, Tab. 2) and qualitatively (Fig. 4, Fig. 15, Fig. 16) against recent adaptive tokenization methods using standard image reconstruction metrics. KARL performs competitively across all single-image metrics LPIPS, SSIM, PSNR, and Dream Sim. We also compare these tokenizers in terms of the minimum number of tokens ( ˆ KC) required to reconstruct the input image to a given reconstruction quality (Tab. 2, Fig. 17, Fig. 18).
Researcher Affiliation	Collaboration	Shivam Duggal Sanghyun Byun William T. Freeman Antonio Torralba Phillip Isola Massachusetts Institute of Technology LG Electronics
Pseudocode	Yes	Figure 3: Training algorithm. KARL follows an upside-down RL paradigm (a) Estimate Image Complexity stage samples task-defining inputs {image, token budget T, reconstruction error ϵ0} by attempting lossless compression (ϵ = 0). (b) Learn to Tokenize stage trains the model conditioned on ϵ0 to match the same quality using T+ T tokens while halting the extra (rightmost) T.
Open Source Code	Yes	Code: https://github.com/Shivam Duggal4/kolmogorov-tokenizer.
Open Datasets	Yes	Datasets: Throughout the paper, we utilize Imagenet or a 100-class subset of Imagenet dataset to train and evaluate our models. The 100-class subset of Imagenet is utilized in multiple prior works, serving a good dataset with decent scale (0.1M images) and lesser compute requirements. Human evaluation (Fig. 14) was done on Savioas Dataset [33] which contains human annotations of complexity on several known computer vision datasets.
Dataset Splits	No	The paper mentions "5000 IN100 validation images" in Table 3. However, it does not specify the training and test splits or the overall split percentages for the Imagenet or IN100 dataset used for experiments.
Hardware Specification	Yes	Compute Requirements: Majority of the training was done on single A100 or H100 machine with eight 80GB gpus or on machines with equivalent GPU memory (distributed training on 4 H200 gpus).
Software Dependencies	No	The paper does not explicitly state specific version numbers for any software components, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Our training framework follows a two-phase formulation as described in the main paper: (1) Estimating Image Complexity, where the model attempts near-lossless reconstruction using a randomlysampled token budget T, and (2) Learning to Tokenize at Estimated Complexity, where the model learns to ignore surplus tokens when provided a larger token budget T + T, while maintaining the same reconstruction quality. ... We initialize a list of discrete ℓ1 loss targets including low values (0.0, 0.01, 0.02), mid-range values (0.03 0.11), and a few larger ones (0.14 0.4) each mapped to an embedding vector of the same dimensionality as the encoder s input tokens. In each training iteration, a token count is randomly sampled from 16, 32, ..., 256. ... For VQGAN tokenizers, we use a cross-entropy loss over predicted logits and ground-truth codebook indices. For VAE tokenizers, we use mean squared error (MSE) loss over the predicted and target token embeddings. ... This includes pixel-wise ℓ1 loss, an adversarial GAN loss, LPIPS perceptual loss, and standard quantization losses (commitment and codebook loss).