Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Differentiable Generalized Sliced Wasserstein Plans

Authors: Laetitia Chapel, Romain Tavenard, Samuel Vaiter

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of our Differentiable Generalized Sliced Wasserstein Plans, coined DGSWP, by assessing its ability to provide a meaningful approximated OT plan in several contexts. First, we consider a toy example where a non-linear projection must be considered; we then perform gradient flow experiments on Euclidean and hyperbolic spaces, demonstrating the versatility of our approach. Finally, we integrate sliced-OT plans in an OT-based conditional flow matching in lieu of mini-batch OT. In all the experiments, we use ε = 0.05 and N = 20 as DGSWP-specific hyperparameters. Full experimental setups and additional results are provided in App. A.3. Implementation is available online3; we also use POT toolbox [21].
Researcher Affiliation Academia Laetitia Chapel IRISA L Institut Agro Rennes-Angers Rennes, France. EMAIL Romain Tavenard Université de Rennes 2 IRISA Rennes, France. EMAIL Samuel Vaiter CNRS & Université Côte d Azur Laboratoire J. A. Dieudonné Nice, France. EMAIL
Pseudocode Yes Algorithm 1 (in Supplementary material) describes a gradient descent method to perform the minimization of hε using this Monte-Carlo approximation.
Open Source Code Yes Implementation is available online3; we also use POT toolbox [21]. 3https://github.com/rtavenar/dgswp
Open Datasets Yes We conduct experiments on CIFAR10 [31], reporting FID scores for DGSWP-CFM, OT-CFM, and I-CFM across varying numbers of sampling steps.
Dataset Splits Yes We conduct experiments on CIFAR10 [31], reporting FID scores for DGSWP-CFM, OT-CFM, and I-CFM across varying numbers of sampling steps. We use the experimental setup and hyperparameters from Tong et al. [54].
Hardware Specification Yes All experiments except the Conditional Flow Matching (CFM) were run on a Mac Book Pro M2 Max with 32 GB of RAM. On this machine, Fig. 1 took approximately 3 minutes per run (10,000 iterations), Fig. 3 about 6 minutes for 10 runs (with two models trained sequentially, 1,000 iterations), Fig. 4 required roughly 30 minutes, and Fig. 5 took around 10 minutes in total (all models considered, 10 repetitions). The CFM experiments were dispatched over a GPU cluster composed of GPU-A100 80G, GPU-A6000 48 Go, with a total runtime of 130h for training and inference of all presented models.
Software Dependencies No The paper mentions "POT toolbox [21]", "Python toolbox geoopt [29]", and "torch CFM [54]". However, no specific version numbers are provided for these software packages, which are required for a reproducible description of software dependencies.
Experiment Setup Yes A.3.1 Hyperparameter settings We report here the hyperparameter configurations used across the main experiments. Figures 1 and 3 correspond to the same experiment Fig. 3 highlights early training dynamics, while Fig. 1 depicts results at convergence. The projection network used is a 3-layer MLP with Re LU activations: (with dimensions 2 64 16 1). Optimization is done using SGD with a learning rate of 0.2; for the variant without variance reduction, a lower learning rate of 0.0002 is used to ensure convergence (cf. Fig. 7 in which the same learning rate is used for both variants). In Figure 4 (gradient flow experiments), we perform 2000 outer flow steps using SGD with a learning rate of 0.01. At each flow step, we execute 20 projection steps (or inner optimization updates when using learnable projectors). For the latter, we use Adam with a learning rate of 0.01. The neural projector for our method is a single-hidden-layer MLP with Re LU activations and He initialization. In Figure 5, which investigates gradient flows on hyperbolic manifolds, we vary the outer learning rate across methods to account for differences in convergence speed: the base learning rate is 2.5, used for HHSW; SW uses a scaled learning rate of 17.5, and DGSWP uses a reduced rate of 0.83. Each flow step is composed of 100 projection or inner optimization steps. For the Conditional Flow Matching (CFM) experiment shown in Figures 6, 8, 9 and Table 1, we adopt the same training hyperparameters as in Tong et al. [54]. For our method specifically, the projection model is a 3-layer fully connected network with SELU activations: 3 32 32 256 256 1. Its parameters are optimized using Adam with a learning rate of 0.01. We perform 1000 optimization steps for the projection model at initialization, followed by 1 step per CFM training iteration.