Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BM$^2$: Coupled Schrödinger Bridge Matching

Authors: Stefano Peluchetti

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments are presented in Section 5, followed by a discussion of related works in Section 6. Section 7 concludes the paper. For clarity, a more general formulation of BM2 is deferred to Appendix A, all proofs to Appendix B, an additional numerical experiment to Appendix C, and code listings to Appendix D. ... 5 Numerical Experiments To evaluate the performance of BM2 on EOT problems, we utilize the benchmark developed by Gushchin et al. (2023). ... Results for evaluation metrics (20) and (21) are summarized in Table 1 and Table 2, respectively.
Researcher Affiliation	Industry	Stefano Peluchetti EMAIL Sakana AI
Pseudocode	Yes	Finally, the implementation is straightforward (i, iv), as illustrated in Algorithms 1 and 2 and in the annotated Py Torch code of Listing 1. ... Algorithm 1 BM2 training loss computation ... Algorithm 2 BM2 training loop ... Listing 1: Basic implementation of BM2 loss computation (Algorithm 1) in Py Torch.
Open Source Code	Yes	Finally, the implementation is straightforward (i, iv), as illustrated in Algorithms 1 and 2 and in the annotated Py Torch code of Listing 1. ... D Python Code
Open Datasets	Yes	To evaluate the performance of BM2 on EOT problems, we utilize the benchmark developed by Gushchin et al. (2023). For the reference process (R), this benchmark provides pairs of target distributions Ψ0, Ψ1 with analytical EOT solution S0,1 and analytical SB-optimal drift function µs.
Dataset Splits	No	The paper does not explicitly state specific training, validation, or test dataset splits. It mentions using a 'benchmark developed by Gushchin et al. (2023)' and states, 'We use 1, 000 Monte Carlo samples to estimate (20, 21). Each method undergoes 50, 000 SGD training steps with a batch size of 1, 000'.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies	No	The paper mentions 'Py Torch code of Listing 1' and 'Adam W optimizer' but does not specify version numbers for PyTorch, Python, or other software libraries or frameworks used in the implementation or experiments.
Experiment Setup	Yes	Each method undergoes 50, 000 SGD training steps with a batch size of 1, 000, settings similar to those used by Gushchin et al. (2023), enabling qualitative comparison of our results with theirs. We use the Adam W optimizer with a learning rate of 10^-4 and hyperparameters: β = (0.9, 0.999), ϵ = 10^-8, wd = 0.01, where wd denotes weight decay. Time is sampled as t U(ϵ, 1 ϵ) for ϵ = 0.0025. For BM2, we employ a single feedforward neural network with 3 layers of width 768 and Re LU activation, resulting in approximately 1 million parameters.