Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fair Kernel Regression through Cross-Covariance Operators
Authors: Adrian Perez-Suay, Paula Gordaliza, Jean-Michel Loubes, Dino Sejdinovic, Gustau Camps-Valls
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide empirical evidence of the performance of the proposed methods in a set of experiments. Firstly, numerical evidence of convergence of the loss bound in the EO linear regression setting is provided over a simulation set. Secondly, we study the trade-off between error rates and fairness in the proposed cross-covariance metric; the results cover six databases. Thirdly, an empirical comparison of the weights behaviour in the linear model evaluation. |
| Researcher Affiliation | Academia | Adrián Pérez-Suay EMAIL Image Processing Laboratory (IPL) Universitat de València; Paula Gordaliza EMAIL Basque Center for Applied Mathematics Universidad Pública de Navarra; Jean-Michel Loubes EMAIL Institut de Mathématiques de Toulouse University Toulouse 3; Dino Sejdinovic EMAIL School of Computer and Mathematical Sciences University of Adelaide; Gustau Camps-Valls EMAIL Image Processing Laboratory (IPL) Universitat de València |
| Pseudocode | No | The paper primarily focuses on mathematical formulations, theoretical analysis, and empirical evaluations. It does not include any clearly labeled pseudocode or algorithm blocks presenting step-by-step procedures in a structured, code-like format. |
| Open Source Code | Yes | A working implementation, demos and code snippets are available at https://www.uv.es/pesuaya/data/code/2023_FACIL.zip. |
| Open Datasets | Yes | The second set of experiments uses four real datasets (over six considered protected variables)... In particular, we consider: 1) the Adult income dataset (Dua and Graff, 2017), 2) the Communities and Crime (Redmond, 2009) (C&C), 3) the National Longitudinal Survey of Youth (Bureau of Labor Statistics, 2019) (NLSY), and 4) the Compas recidivism risk score data (Larson et al., 2016). |
| Dataset Splits | Yes | We split data into training, validation and test independent sets. We fix the size of the training set to N = 600 samples, the size of the validation set to 100 samples, and the test set to 2000 samples, or the remainder available. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not explicitly mention specific version numbers for any software dependencies, programming languages, or libraries used in the implementation. |
| Experiment Setup | No | The paper mentions hyperparameters λ and µ and that they are tuned by cross-validation or fixed a priori, and experiments are run for 25 independent trials. However, it does not provide specific values for these hyperparameters or other system-level training settings used in the experiments (e.g., ranges for λ and µ, optimization algorithms, learning rates, batch sizes, or number of epochs). |