Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Authors: Zhekai Chen, Ruihang Chu, Yukang Chen, Shiwei Zhang, Yujie Wei, Yingya Zhang, Xihui Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the powerful VAR model Infinity2B show a notable 8.7% Gen Eval score improvement (0.69 0.75). Key insights reveal that early-stage structural features effectively influence final quality, and resampling efficacy varies across generation scales. ... In this section, we demonstrate the effectiveness of our TTS-VAR on the powerful VAR model Infinity [14] with resampling temperature λ = 10. We present the comparisons in Sec. 5.1 and Sec. 5.2, and precisely analyze design details in Sec. 5.4 and Sec. 5.5. Following previous work [24, 22, 51], we utilize Image Reward [52] as the reward function for guidance. We evaluate results using the main metric Gen Eval [53], and T2I-Comp Bench [54], with relevant indicators Image Reward [52], HPSv2.1 [55, 56], Aesthetic V2.5 [57], and CLIP-Score [58, 59], based on prompts offered by Gen Eval.
Researcher Affiliation	Collaboration	Zhekai Chen1 Ruihang Chu2 Yukang Chen3 Shiwei Zhang2 Yujie Wei2 Yingya Zhang2 Xihui Liu1 1 HKU MMLab 2 Tongyi Lab, Alibaba Group 3 CUHK EMAIL
Pseudocode	Yes	We describe the algorithm of TTS-VAR in Alg. 1. Following the generation process of VAR [13] (Infinity [14]), TTS-VAR first predicts the residual tokens at the current scale and adds them to the accumulated feature maps.
Open Source Code	Yes	Code is available at https://github.com/ali-vilab/TTS-VAR.
Open Datasets	Yes	We evaluate results using the main metric Gen Eval [53], and T2I-Comp Bench [54], with relevant indicators Image Reward [52], HPSv2.1 [55, 56], Aesthetic V2.5 [57], and CLIP-Score [58, 59], based on prompts offered by Gen Eval.
Dataset Splits	Yes	We evaluate results using the main metric Gen Eval [53], and T2I-Comp Bench [54], with relevant indicators Image Reward [52], HPSv2.1 [55, 56], Aesthetic V2.5 [57], and CLIP-Score [58, 59], based on prompts offered by Gen Eval. ... For each prompt, we generated four corresponding images for evaluation using seeds 0 through 3.
Hardware Specification	Yes	We evaluate our method with N = 2, 4 on the GPU Nvidia A800-SXM4-80GB.
Software Dependencies	No	We implement comparisons on using different reward models to rate the intermediate images and calculate the potential scores (VALUE), including Aesthetic [57], Image Reward [52], HPSv2 [56], and HPS+Image Reward. ... In K-Means, we set the next batch size bi+1 in the adaptive descending batch sizes as the number of centers, to find bi+1 different structure categories. For PCA, we only select the first major component, as we utilize it for dimension reduction only.
Experiment Setup	Yes	In this section, we demonstrate the effectiveness of our TTS-VAR on the powerful VAR model Infinity [14] with resampling temperature λ = 10. ... Specifically, the adaptive batch size here is [8,8,6,6,6,4,2,2,2,1,1,1,1]. This batch size schedule enables more possibilities with little additional consumption. ... We opt to resample only at scales 6 and 9. ... We specifically apply clustering on these scales [2, 5].