reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Statistical Robustness of Empirical Risks in Machine Learning

Authors: Shaoyan Guo, Huifu Xu, Liwei Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper studies convergence of empirical risks in reproducing kernel Hilbert spaces (RKHS). A conventional assumption in the existing research is that empirical training data are generated by the unknown true probability distribution but this may not be satisfied in some practical circumstances. Consequently the existing convergence results may not provide a guarantee as to whether the empirical risks are reliable or not when the data are potentially corrupted (generated by a distribution perturbed from the true). In this paper, we fill out the gap from robust statistics perspective (Krätschmer, Schied and Zähle (2012); Krätschmer, Schied and Zähle (2014); Guo and Xu (2020)). First, we derive moderate sufficient conditions under which the expected risk changes stably (continuously) against small perturbation of the probability distributions of the underlying random variables and demonstrate how the cost function and kernel affect the stability. Second, we examine the difference between laws of the statistical estimators of the expected optimal loss based on pure data and contaminated data using Prokhorov metric and Kantorovich metric, and derive some asymptotic qualitative and non-asymptotic quantitative statistical robustness results. Third, we identify appropriate metrics under which the statistical estimators are uniformly asymptotically consistent. These results provide theoretical grounding for analysing asymptotic convergence and examining reliability of the statistical estimators in a number of regression models.
Researcher Affiliation	Academia	Shaoyan Guo EMAIL School of Mathematical Sciences Dalian University of Technology Dalian, 116024, China; Huifu Xu EMAIL Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong Shatin, N.T., Hong Kong; Liwei Zhang EMAIL School of Mathematical Sciences Dalian University of Technology Dalian, 116024, China
Pseudocode	No	The paper presents theoretical analysis, proofs, and mathematical derivations for statistical robustness and consistency of empirical risks. It does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper is a theoretical work focusing on statistical analysis and mathematical derivations. It does not mention any specific source code for release, nor does it provide links to any repositories.
Open Datasets	No	The paper focuses on theoretical aspects of statistical robustness in machine learning. It discusses generic 'empirical training data' but does not refer to specific datasets or provide any access information for publicly available datasets.
Dataset Splits	No	As this paper is theoretical and does not conduct experiments with specific datasets, there is no mention of training, test, or validation dataset splits.
Hardware Specification	No	The paper is a theoretical study and does not describe any experiments. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and focuses on mathematical analysis. It does not mention any specific software or library dependencies with version numbers used for experimental implementation.
Experiment Setup	No	The paper is theoretical and does not include any experimental validation. Consequently, there are no details provided regarding experimental setup, hyperparameters, or training configurations.