Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Bayes Risk Lower Bounds

Authors: Xi Chen, Adityanand Guntuboyina, Yuchen Zhang

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper provides a general technique for lower bounding the Bayes risk of statistical estimation, applicable to arbitrary loss functions and arbitrary prior distributions. A lower bound on the Bayes risk not only serves as a lower bound on the minimax risk, but also characterizes the fundamental limit of any estimator given the prior knowledge. Our bounds are based on the notion of f-informativity (Csisz ar, 1972), which is a function of the underlying class of probability measures and the prior. Application of our bounds requires upper bounds on the f-informativity, thus we derive new upper bounds on f-informativity which often lead to tight Bayes risk lower bounds. Our technique leads to generalizations of a variety of classical minimax bounds (e.g., generalized Fano s inequality). Our Bayes risk lower bounds can be directly applied to several concrete estimation problems, including Gaussian location models, generalized linear models, and principal component analysis for spiked covariance models. To further demonstrate the applications of our Bayes risk lower bounds to machine learning problems, we present two new theoretical results: (1) a precise characterization of the minimax risk of learning spherical Gaussian mixture models under the smoothed analysis framework, and (2) lower bounds for the Bayes risk under a natural prior for both the prediction and estimation errors for high-dimensional sparse linear regression under an improper learning setting.
Researcher Affiliation	Academia	Xi Chen EMAIL Stern School of Business New York University New York, NY 10012, USA; Adityanand Guntuboyina EMAIL Department of Statistics University of California Berkeley, CA 94720, USA; Yuchen Zhang EMAIL Computer Science Department Stanford University Stanford, CA 94305, USA
Pseudocode	No	The paper describes methods and derivations using mathematical formulations and proofs, but it does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide links to any code repositories or supplementary materials containing code.
Open Datasets	No	The paper focuses on theoretical derivations and applications of Bayes risk lower bounds to various statistical models (e.g., Gaussian location models, generalized linear models, spherical Gaussian mixture models, sparse linear regression). It does not conduct experiments that would require specific datasets, and therefore no dataset access information is provided.
Dataset Splits	No	The paper is theoretical and does not involve experimental evaluation on datasets. Therefore, it does not specify any training/test/validation dataset splits.
Hardware Specification	No	The paper is theoretical in nature, focusing on mathematical bounds and proofs rather than empirical experiments. As such, it does not describe any specific hardware used for computations or experiments.
Software Dependencies	No	The paper is theoretical and focuses on mathematical derivations and proofs. It does not describe any experimental setup or implementations that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is primarily theoretical, presenting mathematical derivations and proofs for Bayes risk lower bounds. It does not describe an experimental setup with hyperparameters or training configurations for empirical validation.