Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Functional data analysis for multivariate distributions through Wasserstein slicing
Authors: Han Chen, Hans-Georg Müller
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted Monte Carlo experiments under two simulation settings: (A) random bivariate normals and (B) mixtures of two bivariate normals. Details of the data generation process appear in Table 1. In both settings, we generate 50 random densities, each from latent parameters, and sample 200 observations per density. and We analyze blood pressure data from the Baltimore Longitudinal Study of Aging (BLSA) https://www.blsa.nih.gov/. |
| Researcher Affiliation | Academia | Han Chen Department of Statistics University of California, Davis Davis, CA 95616 EMAIL Hans-Georg Müller Department of Statistics University of California, Davis Davis, CA 95616 EMAIL |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is provided along with the submission. |
| Open Datasets | Yes | We analyze blood pressure data from the Baltimore Longitudinal Study of Aging (BLSA) https://www.blsa.nih.gov/. and Our second data illustration involves analyzing maximum and minimum temperature data obtained from the National Centers for Environmental Information, with raw data conveniently accessible at https://www.ncdc.noaa.gov/. |
| Dataset Splits | No | The paper describes how data was generated for simulations and how real-world data was processed (e.g., "For each age group, we estimate joint two-dimensional densities"), but it does not specify explicit training, validation, or test dataset splits in the conventional sense for model evaluation. |
| Hardware Specification | Yes | All data analyses were performed on a Mac Book Pro with an Apple M1 chip and 16GB RAM, running R version 4.3.1 (2023-06-16) under mac OS Big Sur. |
| Software Dependencies | Yes | All data analyses were performed on a Mac Book Pro with an Apple M1 chip and 16GB RAM, running R version 4.3.1 (2023-06-16) under mac OS Big Sur. |
| Experiment Setup | Yes | We conducted Monte Carlo experiments under two simulation settings: (A) random bivariate normals and (B) mixtures of two bivariate normals. Details of the data generation process appear in Table 1. and The estimator is applied over 51 equidistant grid points in each direction, covering the domains [50, 205] for SBP and [40, 125] for DBP with Gaussian kernel and bandwidths h SBP = 24 and h DBP = 15. and In terms of the selection of tuning parameter, By selecting the first K components in the expansion (11) and applying the regularized inverse transformation, the truncated representations can be obtained through fi(x, K, τ) = Ψ 1 τ . |