Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fast PCA in 1-D Wasserstein Spaces via B-splines Representation and Metric Projection
Authors: Matteo Pegoraro, Mario Beraha9342-9349
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive simulation studies, we show how our PCA performs similarly to the ones already proposed in the literature while retaining a much smaller computational cost. We apply our method to a real dataset of mortality rates due to Covid-19 in the US, concluding that our analyses are consistent with the current scientific consensus on the disease. |
| Researcher Affiliation | Academia | 1 MOX Department of Mathematics, Politecnico di Milano 2 Department of Mathematics, Politecnico di Milano 3 Department of Computer Science, Universit a di Bologna |
| Pseudocode | No | The paper describes mathematical formulations and optimization problems but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | No | The paper mentions a public repository link: “The code is publicly available at https://github.com/ecazelles/ 2017-GPCA-vs-Log PCA-Wasserstein”. However, this link points to code for an existing method (Cazelles et al. 2017), which they used for comparison, not the open-source code for their novel projected PCA methodology described in the paper. |
| Open Datasets | Yes | Data are freely available at https://data.cdc.gov/NCHS/ Provisional-COVID-19-Death-Counts-by-Sex-Age-and-S/9bhghcku. |
| Dataset Splits | Yes | Each result displays the average 10-fold cross validation accuracy, averaged again over 20 repetitions one standard deviation. |
| Hardware Specification | Yes | All experiments were performed on a laptop equipped with a 8-core Intel i7-7700HQ CPU 2.80GHz and 16Gb of RAM. |
| Software Dependencies | Yes | The main numerical libraries employed consist of the Python packages numpy, scipy and qpsolvers (v 1.1) and of the optimization library Ipopt (v 3.12.12) interfaced with the Python package pyomo. |
| Experiment Setup | Yes | In the following, we will always center the PCA in the barycenter of the data, i.e. a0 = n 1 Pn i=1 ai. Moreover, we consider the spline basis {ψj}J j=1 with J = 20 and equispaced knots in [0, 1]... After performing a PCA, a Support Vector Machine (SVM) classifier is fit, with parameters C = 1.0, radial basis function kernel and default value for the parameter γ |