Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation

Authors: Zhi-Kai Chen, Jun-Peng Jiang, Han-Jia Ye, De-Chuan Zhan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on multiple text-to-image benchmarks demonstrate a 1.71 speedup over standard AR models, while preserving both image fidelity and diversity. 5 Experiment
Researcher Affiliation Academia 1 School of Artificial Intelligence, Nanjing University, China 2 National Key Laboratory for Novel Software Technology, Nanjing University, China EMAIL
Pseudocode Yes Figure 2: An overview of our Hawk method is presented. During each iteration of the inference process, horizontal and vertical speculations are generated using the draft head. The vertical speculations are stored in the Speculation Cache for future use when processing subsequent lines. Meanwhile, the horizontal speculations are combined with the previous vertical speculations to create the speculation sampling pool. From this pool, tree decoding candidates are generated, followed by a verification step akin to tree speculative decoding.
Open Source Code No The code for this paper will be released upon acceptance.
Open Datasets Yes For the benchmark, we use the test sets from COCO 2017 [22] and Flickr30K [29]. From these datasets, we sample 500 examples from each for evaluation.
Dataset Splits Yes For the benchmark, we use the test sets from COCO 2017 [22] and Flickr30K [29]. From these datasets, we sample 500 examples from each for evaluation.
Hardware Specification Yes All experiments are conducted on RTX 3090 GPUs.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, only mentioning the use of the AdamW optimizer without a version.
Experiment Setup Yes We evaluate our proposed Hawk method on the Lumina-m GPT model [23]. For testing, we generate 768 768 images and use top-k sampling with k = 2000 and a temperature of 1.0. We also employ classifier-free guidance [8] with a guidance scale of 3.0. To train our Spatial Draft Head, we largely follow the same procedure used for Medusa and Lumina-m GPT. We only train the draft head, while keeping the rest of the model frozen. We use the Adam W [24] optimizer with a weight decay of 0.1 and β = (0.9, 0.95). The base learning rate is set to 2 10 5. We set the draft head balance weight λk to 1. We use 6, 000 images sampled from the LAION aesthetic training set [32] to fine-tune our drafter head.