Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Truncated Newton Method for Optimal Transport

Authors: Mete Kemertas, Amir-massoud Farahmand, Allan Jepson

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our GPU-parallel algorithm exhibits exceptionally favorable runtime performance, achieving high precision orders of magnitude faster than many existing alternatives. This is evidenced by wall-clock time experiments on 24 problem sets (12 datasets 2 cost functions). The scalability of the algorithm is showcased on an extremely large OT problem with n 106, solved approximately under weak entropic regularization.
Researcher Affiliation Academia Mete Kemertas1,4 Amir-massoud Farahmand2,3,1 Allan D. Jepson1 1University of Toronto, 2Polytechnique Montréal, 3Mila Quebec AI Institute, 4Vector Institute
Pseudocode Yes Algorithm 1 MDOT(C, r, c, γi, γf, p ≥ 1, q > 1) ... Algorithm 2 Newton Solve( ∇ug, Prc, r(P), η) ... Algorithm 3 Chi Sinkhorn(u, v, γ, C, r, c, r(P), εχ) ... Algorithm 4 Truncated Newton Project(u, v, γ, C, r, c, εd)
Open Source Code Yes Correspondence to: EMAIL. Code available at github.com/metekemertas/mdot_tnt.
Open Datasets Yes Upsampled MNIST. Each image is a probability distribution over pixels, with probabilities given by L1-normalized intensity values. The cost matrix C is fixed across OT problems and constructed from upsampled MNIST... In addition to wall-clock time, we count here the number of primitive operations costing O(n2) for each algorithm. Examples of such primitive operations involving n n matrices include row/column sums of matrices, matrix-vector products, element-wise multiplication of matrices, element-wise exponentiation/logarithm of matrices, addition/subtraction/multiplication/division between a matrix and a scalar, max over all entries of a matrix, summation over all entries of a matrix, etc. We count the number of primitive operations rather than a higher level function call such as the number of gradient evaluations due to inherent differences in the design of the various baseline algorithms; e.g., some methods require costly line search or inner loops between gradient evaluations. For all 20 problem sets displayed in Figs. 5-14, we find that the total number of O(n2) operations predict wall-clock time very well (high correlation), especially when the algorithms are run for long enough, as seen visually upon comparing top and bottom rows of the same column. All algorithms follow the same trend seen in Fig. 2, so that our conclusions once again remain the same.
Dataset Splits No The paper describes using 24 problem sets (12 datasets 2 cost functions), sampling 100 problems or 60 random problems from these sets for benchmarking. It mentions specific datasets like 'upsampled MNIST' and the 'DOTmark benchmark' by Schrieber et al. (2017) which consists of 45 discrete OT problems. However, the paper does not specify explicit training, validation, or test dataset splits in terms of percentages or counts for model training or evaluation as would typically be done in a machine learning context. The experiments appear to involve solving optimal transport problems on individual instances from these sets rather than traditional data splitting.
Hardware Specification Yes All experiments were run on an NVIDIA GeForce RTX 2080 Ti GPU with 64-bit precision.
Software Dependencies No The paper mentions 'Py Ke Ops package of Charlier et al. (2021)' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup Yes Alg. 1 is called with γi = 25, γf = 218 and p = 1.5. For the adaptive approach, q = 2 initially... we fix regularization weight at γf = 212... with a fixed temperature decay rate of q = 21/16.