Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Variance-Aware Estimation of Kernel Mean Embedding
Authors: Geoffrey Wolfer, Pierre Alquier
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6.2, we put our methods into practice, first in the context of hypothesis testing, and second by improving the results of Briol et al. (2019) and Ch erief-Abdellatif and Alquier (2022) in the context of robust parametric maximum mean discrepancy estimation. ... Figure 1: Comparison of the test based on the Bernstein empirical (Emp Ber) bound, versus the test based on Mc Diarmid bound (Mc Dia), and the test based on the Monte-Carlo estimation of the quantile q1 α. Frequency of rejection of H0 : P {N((1, 1), I2)} as a function of σ with P = N(0, σ2I2). |
| Researcher Affiliation | Academia | Geoffrey Wolfer EMAIL Center for Data Science Waseda University 1-6-1 Nishiwaseda, Shinjuku-ku Tokyo 169-8050, Japan Pierre Alquier EMAIL ESSEC Business School Asia-Pacific campus 5 Nepal Park 575749 Singapore |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. The methods are described through mathematical formulations, theorems, and proofs. |
| Open Source Code | No | The paper includes a license statement: 'License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/23-0161.html.' This refers to the paper's license, not the availability of source code for the methodology. No other concrete statement or link regarding code release is present. |
| Open Datasets | No | The experiments in Section 6 describe using synthetic data, such as 'P = N(0, σ2I2) with Pθ = N(θ, I2)' for simulations, rather than referring to established public datasets with access information. |
| Dataset Splits | No | The paper describes simulation-based experiments using generated data (e.g., Gaussian distributions) and does not specify any training, testing, or validation splits for a dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers needed to replicate the experiments. |
| Experiment Setup | Yes | The kernel used is a Gaussian kernel with γ = 1, and we consider sample sized n {16, 40, 100, 250} (Section 6.1.1). For comparisons, experiments are run for 'fixed sample size (n = 10000), fixed confidence level (δ = 0.1), fixed variance parameter (σ = 3) and two different contamination levels (ξ = 0.01 and ξ = 0.2)' (Section 6.2.2). |