Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Characterization of translation invariant MMD on Rd and connections with Wasserstein distances
Authors: Thibault Modeste, Clément Dombry
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A short numerical experiment illustrates our findings in the framework of the one-sample-test. We propose a simple numerical experiment illustrating the behaviour of the various MMDs considered in this paper in the context of the One-Sample-Test. We report in Figure 1 the rejection rates of the tests corresponding to these different distances for DGP1 and DGP2 respectively. |
| Researcher Affiliation | Academia | Thibault Modeste EMAIL Institut Camille Jordan Universit e Claude Bernard Lyon 1 CNRS UMR 5208, F-69622 Villeurbanne, France Cl ement Dombry EMAIL Universit e de Franche-Comt e, CNRS, Lm B (UMR 6623), F-25000 Besan con, France |
| Pseudocode | No | The paper describes methodologies and proofs using mathematical notation and prose but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an unambiguous statement or a direct link indicating that the authors have released source code for the methodology described in this paper. It mentions existing work with open-source implications (e.g., MMD and Wasserstein GANs) but not code specific to their contributions. |
| Open Datasets | No | The paper uses simulated data from well-known theoretical distributions (standard Gaussian distribution, Student distribution) for its numerical experiments. It does not provide concrete access information (links, DOIs, repositories, or specific citations) for a publicly available or open dataset in the typical sense of machine learning datasets. |
| Dataset Splits | No | The paper describes using a sample of size n=100 and a simulated independent sample of size m=500 for a one-sample test. These are sample sizes for simulated data generation and comparison, not traditional dataset splits (e.g., train/test/validation) of an existing dataset for model training or evaluation. |
| Hardware Specification | No | The paper describes numerical experiments but does not provide specific details about the hardware (e.g., GPU/CPU models, memory, or cluster specifications) used to run these experiments. |
| Software Dependencies | No | The paper describes mathematical frameworks and statistical tests but does not specify any ancillary software or library names with version numbers that would be needed to replicate the experiments. |
| Experiment Setup | Yes | We consider the tests as described above with n = 100, m = 500, B = 1000 and α = 0.05 and the following distances: GK: the MMD associated with the Gaussian kernel with variance σ2 = d, i.e. k(x, y) = exp( x y 2/(2d)) (similar to Example 1); ESK1-ESK3: the MMD associated with energy score kernel with power α = 0.25, 0.5 and 0.75 respectively (see Example 4); MGK: the MMD associated with the modified Gaussian kernel k(x, y) = exp( x y 2/(2d)) + d 1x y (see Example 6). W1: the Wasserstein distance of order 1. |