Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bespoke Solvers for Generative Flow Models

Authors: Neta Shaul, Juan Perez, Ricky T. Q. Chen, Ali Thabet, Albert Pumarola, Yaron Lipman

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For example, a Bespoke solver for a CIFAR10 model produces samples with Fréchet Inception Distance (FID) of 2.73 with 10 NFE, and gets to 1% of the Ground Truth (GT) FID (2.59) for this model with only 20 NFE.
Researcher Affiliation	Collaboration	N. Shaul1 J. Perez2 R. T. Q. Chen3 A. Thabet2 A. Pumarola2 Y. Lipman3,1 1Weizmann Institute of Science 2Gen AI, Meta 3FAIR, Meta
Pseudocode	Yes	Algorithm 1 Numerical ODE solver.
Open Source Code	No	No explicit statement or link to the open-source code for the methodology described in this paper is provided. The paper cites a third-party tool's GitHub, but not its own implementation.
Open Datasets	Yes	Our method works with pre-trained models: we use the pre-trained CIFAR10 (Krizhevsky & Hinton, 2009) model of (Song et al., 2020b) with published weights from EDM (Karras et al., 2022). Additionally, we trained diffusion/flow models on the datasets: CIFAR10, AFHQ-256 (Choi et al., 2020a) and Image Net-64/128 (Deng et al., 2009).
Dataset Splits	Yes	We compute FID (Heusel et al., 2017) and validation RMSE (equation 6) is computed on a set of 10K fresh noise samples x0 p(x0); Figure 12 depicts an example of RMSE vs. training iterations for different n values. Unless otherwise stated, below we report results on best FID iteration and show samples on best RMSE validation iteration.
Hardware Specification	No	Table 5: Pre-trained models hyper-parameters. GPUs 8 8 64 64 64. Specific GPU models or processor types are not mentioned, only the number of GPUs used.
Software Dependencies	No	The paper mentions "Adam optimizer Kingma & Ba (2017)", "DOPRI5 method (Shampine, 1986)", and refers to "Chen, 2018" likely for `torchdiffeq`. However, no specific version numbers for these or other software components are provided.
Experiment Setup	Yes	Table 3: Hyper-parameters of Bespoke solvers training on CIFAR10/Image Net-64/Image Net-128/AFHQ 256. Total number of trajectories 72k 48k 48k 4k Batch size 12 8 8 1 Number of iterations 6k 6k 6k 4k