Accelerating Sinkhorn algorithm with sparse Newton iterations

Authors: Xun Tang, Michael Shavlovsky, Holakou Rahmanian, Elisa Tardini, Kiran Koshy Thekumparampil, Tesi Xiao, Lexing Ying

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct numerical experiments to compare the original Sinkhorn and SNS algorithms. We use the following settings: We obtained the runtime data on a 2021 Macbook Pro laptop with an Apple M1 Pro chip. The linear system solving is done through the conjugate gradient step as mentioned in Algorithm 1. To set a challenging case, we use an entropic regularization with η = 1200 throughout the experiments. We refer the reader to Appendix F for the performance of SNS under different entropy parameter η.
Researcher Affiliation Collaboration Xun Tang , Michael Shavlovsky , Holakou Rahmanian , Elisa Tardini , Kiran Koshy Thekumparampil , Tesi Xiao , Lexing Ying Stanford University Amazon {xutang,lexing}@stanford.edu {shavlov, holakou, ettardin, kkt, tesixiao}@amazon.com
Pseudocode Yes Algorithm 1 Sinkhorn-Newton-Sparse (SNS)
Open Source Code No The paper does not provide any links to open-source code or explicitly state that the code for their method is available.
Open Datasets Yes In the second numerical experiment, similar to the experiment setting in Cuturi (2013), we consider the more practical case of optimal transport on the MNIST dataset. In particular, two images are respectively converted to a vector of intensities on the 28 × 28 pixel grid, which are then normalized to sum to 1.
Dataset Splits No The paper mentions the use of the MNIST dataset but does not specify how the data was split into training, validation, and test sets, or if standard splits were used. The experiments appear to involve calculating optimal transport between distributions/images rather than training a model with explicit splits.
Hardware Specification Yes We obtained the runtime data on a 2021 Macbook Pro laptop with an Apple M1 Pro chip.
Software Dependencies No The paper mentions the use of the conjugate gradient step for linear system solving but does not specify any software names with version numbers for libraries, frameworks, or programming languages used.
Experiment Setup Yes To set a challenging case, we use an entropic regularization with η = 1200 throughout the experiments. We run Algorithm 1 with N1 = 20 and a target sparsity of λ = 2/n. (...) As the Sinkhorn algorithm converges quite slowly, we pick N1 = 700 before switching to the Newton stage. Choosing a target sparsity of λ = 15/n is sufficient for convergence to the ground truth.