Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Sparse-Plus-Low-Rank Quasi-Newton Method for Entropic-Regularized Optimal Transport

Authors: Chenrui Wang, Yixuan Qiu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the convergence properties of the proposed algorithm are rigorously analyzed, and various numerical experiments are conducted to demonstrate its improved performance in solving large-scale OT problems.
Researcher Affiliation	Academia	1School of Statistics and Data Science, Shanghai University of Finance and Economics, Shanghai, China. Correspondence to: Yixuan Qiu <EMAIL>.
Pseudocode	Yes	Algorithm 1 The Sparse-plus-low-rank quasi-Newton method for entropic-regularized OT Algorithm 2 Sparsification scheme with a given density
Open Source Code	Yes	An efficient implementation of the method is included in the Reg OT Python package1. 1https://github.com/yixuan/regot-python The experiments in this section can be reproduced using the code on our Github repository2. 2https://github.com/Aoblex/numerical-experiments
Open Datasets	Yes	Specifically, we randomly select two images from the MNIST (Lecun et al., 1998) or Fashion-MNIST (Xiao et al., 2017) dataset... Finally, we reproduce the Image Net experiment in Tang & Qiu (2024) that uses OT to characterize the difference between two data distributions.
Dataset Splits	No	The paper describes synthetic datasets by their generation mechanism and image datasets by random selection of images, but it does not specify explicit training/test/validation splits for models, nor does it provide sample counts or percentages for partitioning data in a reproducible manner for typical machine learning experiments.
Hardware Specification	Yes	All experiments in this article are conducted on a personal computer with an Intel i9-13900K CPU, 32 GB memory, and a Ubuntu 25.04 operating system.
Software Dependencies	Yes	All experiments in this article are conducted on a personal computer with an Intel i9-13900K CPU, 32 GB memory, and a Ubuntu 25.04 operating system.
Experiment Setup	Yes	We consider both synthetic and realistic OT settings, and fix the regularization parameter to be η = 0.001. To make η comparable for different problems, we normalize all cost matrices to have a unit infinity norm, i.e., M M/ M . Line search method Finally, we use the Mor e Thuente line search algorithm (Mor e & Thuente, 1994) to compute the step size αk > 0 that satisfies the Wolfe conditions: f(xk + αkdk) f(xk) + c1αk(g T k dk), [g(xk + αkdk)]T dk c2(g T k dk), where 0 < c1 < 1/2 and c1 < c2 < 1 are pre-specified constants. To avoid introducing a large approximation error, we dynamically set τk to be the current gradient norm gk , so that the τk I term in (5) is negligible when xk is close to the optimum. An additional safeguard is to set a maximum shift τmax, in case gk is too large at the beginning. So overall, we take τk = min{τmax, gk } in each iteration. Based on this idea, the update rule for ρk is: ( max{ρmin, 0.99ρk 1}, if gk < gk 1 min{ρmax, 1.1ρk 1}, otherwise .