Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mean Flows for One-step Generative Modeling

Authors: Zhengyang Geng, Mingyang Deng, Xingjian Bai, Zico Kolter, Kaiming He

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our Mean Flow (MF) model achieves significantly better generation quality than previous state-of-the-art one-step diffusion/flow methods. Here, i CT [43], Shortcut [12], and our MF are all 1-NFE generation, while IMM s 1-step result [52] involves 2-NFE guidance. Detailed numbers are in Tab. 2. Images shown are generated by our 1-NFE model. Our Mean Flow models demonstrate strong empirical performance in one-step generative modeling. On Image Net 256 256 [6], our method achieves an FID of 3.43 using 1-NFE (Number of Function Evaluations) generation. Section 5 Experiments: Experiment Setting. We conduct our major experiments on Image Net [6] generation at 256 256 resolution. We evaluate Fréchet Inception Distance (FID) [17] on 50K generated images. We examine the number of function evaluations (NFE) and study 1-NFE generation by default. Section 5.1 Ablation Study: We investigate the model properties in Tab. 1, analyzed next:
Researcher Affiliation	Collaboration	Zhengyang Geng1 Mingyang Deng2 Xingjian Bai2 J. Zico Kolter1 Kaiming He2. Work partly done when visiting MIT. Mingyang Deng and Xingjian Bai are partially supported by the MIT-IBM Watson AI Lab funding award.
Pseudocode	Yes	Algorithm 1 Mean Flow: Training. Algorithm 2 Mean Flow: 1-step Sampling
Open Source Code	Yes	Our code is available at https://github.com/gsunshine/meanflow. Our code is available at https://github.com/gsunshine/meanflow.
Open Datasets	Yes	On Image Net 256 256 [6], our method achieves an FID of 3.43 using 1-NFE (Number of Function Evaluations) generation. We report unconditional generation results on CIFAR-10 [25] (32 32).
Dataset Splits	Yes	We evaluate Fréchet Inception Distance (FID) [17] on 50K generated images. Image Net 256 256. We use a standard VAE tokenizer to extract the latent representations. CIFAR-10. We experiment with class-unconditional generation on CIFAR-10. Our implementation follows standard Flow Matching practice [29].
Hardware Specification	Yes	We implement the model in JAX and benchmark on v4-8 TPUs. We greatly thank Google TPU Research Cloud (TRC) for granting us access to TPUs.
Software Dependencies	No	In our JAX implementation of Alg. 1, the overhead is less than 20% of the total training time (see appendix). In Py Torch and JAX, jvp returns the function output and JVP. We use Adam [24] with learning rate 0.0006, batch size 1024, (β1, β2) = (0.9, 0.999), dropout 0.2, weight decay 0, and EMA decay of 0.99995.
Experiment Setup	Yes	Table 4: Configurations on Image Net 256 256. epochs 80 240 240 240 240 1000 batch size 256 dropout 0.0 optimizer Adam [24] lr schedule constant lr 0.0001 Adam (β1, β2) (0.9, 0.95) weight decay 0.0 ema decay 0.9999. CIFAR-10: We use Adam with learning rate 0.0006, batch size 1024, (β1, β2) = (0.9, 0.999), dropout 0.2, weight decay 0, and EMA decay of 0.99995. The model is trained for 800K iterations (with 10K warm-up [15]).