Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Single-pass Adaptive Image Tokenization for Minimum Program Search
Authors: Shivam Duggal, Sanghyun Byun, Bill Freeman, Antonio Torralba, Phillip Isola
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare KARL both quantitatively (Tab. 1, Tab. 2) and qualitatively (Fig. 4, Fig. 15, Fig. 16) against recent adaptive tokenization methods using standard image reconstruction metrics. KARL performs competitively across all single-image metrics LPIPS, SSIM, PSNR, and Dream Sim. We also compare these tokenizers in terms of the minimum number of tokens ( ˆ KC) required to reconstruct the input image to a given reconstruction quality (Tab. 2, Fig. 17, Fig. 18). |
| Researcher Affiliation | Collaboration | Shivam Duggal Sanghyun Byun William T. Freeman Antonio Torralba Phillip Isola Massachusetts Institute of Technology LG Electronics |
| Pseudocode | Yes | Figure 3: Training algorithm. KARL follows an upside-down RL paradigm (a) Estimate Image Complexity stage samples task-defining inputs {image, token budget T, reconstruction error ϵ0} by attempting lossless compression (ϵ = 0). (b) Learn to Tokenize stage trains the model conditioned on ϵ0 to match the same quality using T+ T tokens while halting the extra (rightmost) T. |
| Open Source Code | Yes | Code: https://github.com/Shivam Duggal4/kolmogorov-tokenizer. |
| Open Datasets | Yes | Datasets: Throughout the paper, we utilize Imagenet or a 100-class subset of Imagenet dataset to train and evaluate our models. The 100-class subset of Imagenet is utilized in multiple prior works, serving a good dataset with decent scale (0.1M images) and lesser compute requirements. Human evaluation (Fig. 14) was done on Savioas Dataset [33] which contains human annotations of complexity on several known computer vision datasets. |
| Dataset Splits | No | The paper mentions "5000 IN100 validation images" in Table 3. However, it does not specify the training and test splits or the overall split percentages for the Imagenet or IN100 dataset used for experiments. |
| Hardware Specification | Yes | Compute Requirements: Majority of the training was done on single A100 or H100 machine with eight 80GB gpus or on machines with equivalent GPU memory (distributed training on 4 H200 gpus). |
| Software Dependencies | No | The paper does not explicitly state specific version numbers for any software components, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Our training framework follows a two-phase formulation as described in the main paper: (1) Estimating Image Complexity, where the model attempts near-lossless reconstruction using a randomlysampled token budget T, and (2) Learning to Tokenize at Estimated Complexity, where the model learns to ignore surplus tokens when provided a larger token budget T + T, while maintaining the same reconstruction quality. ... We initialize a list of discrete ℓ1 loss targets including low values (0.0, 0.01, 0.02), mid-range values (0.03 0.11), and a few larger ones (0.14 0.4) each mapped to an embedding vector of the same dimensionality as the encoder s input tokens. In each training iteration, a token count is randomly sampled from 16, 32, ..., 256. ... For VQGAN tokenizers, we use a cross-entropy loss over predicted logits and ground-truth codebook indices. For VAE tokenizers, we use mean squared error (MSE) loss over the predicted and target token embeddings. ... This includes pixel-wise ℓ1 loss, an adversarial GAN loss, LPIPS perceptual loss, and standard quantization losses (commitment and codebook loss). |