Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Authors: Gleb Rodionov, Roman Garipov, Alina Shutova, George Yakushev, Erik Schultheis, Vage Egiazarian, Anton Sinitsin, Denis Kuznedelev, Dan Alistarh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct an initial evaluation of Hogwild! Inference to test its ability to collaborate in our zero-shot setting. All evaluations in this section are done with the Qw Q-32B [Qwen Team, 2025] model. We consider two tasks: one with obviously independent tasks that can be done in parallel and another with a more complicated collaboration pattern.
Researcher Affiliation Collaboration Gleb Rodionov Yandex Roman Garipov HSE University Yandex Alina Shutova HSE University Yandex George Yakushev HSE University Yandex Erik Schultheis IST Austria Vage Egiazarian IST Austria Anton Sinitsin Yandex Denis Kuznedelev Yandex Dan Alistarh IST Austria
Pseudocode No The paper describes the inference algorithm in Section 3.4 and its implementation details in Appendix B, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured code-like formatting for the procedure.
Open Source Code Yes 1Our implementation is available at https://github.com/eqimp/hogwild_llm .
Open Datasets Yes D.3 Datasets and Benchmarks This subsection provides links to all datasets and benchmarks referenced in this work, along with their respective licenses. GSM8K https://huggingface.co/datasets/openai/gsm8k License: MIT LIMO https://huggingface.co/datasets/GAIR/LIMO License: Apache 2.0 Olympiad Bench https://huggingface.co/datasets/Hothan/Olympiad Bench License: Apache 2.0 Live Code Bench https://huggingface.co/datasets/livecodebench/code_generation_lite License: cc AIME25 https://huggingface.co/datasets/math-ai/aime25 License: Apache 2.0
Dataset Splits Yes Sanity Checks with GSM8k 5: Before we try our approach on more challenging tasks, we test if Hogwild!Inference is capable of basic collaboration. For this purpose, we construct a toy problem set with 128 samples, each containing 5 non-overlapping questions from the GSM8k test set [Cobbe et al., 2021].
Hardware Specification Yes The experiments were conducted primarily on NVIDIA A100 GPUs servers with NVSwitch, with Deep Seek-R1 experiments running in a distributed setup. The one exception to this is the inference time experiments in Section 4.4 that were run on NVIDIA L40S GPU.
Software Dependencies No The paper mentions software like PyTorch and Flash Attention v2, but does not provide specific version numbers for these or any other key software dependencies required to reproduce the experiments.
Experiment Setup Yes D Detailed Experiment Configuration For the main experiments, we use Hogwild! inference with two workers (Alice and Bob), a combined layout, and the prompting techniques described in Appendix C.