Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Prediction regions through Inverse Regression

Authors: Emilie Devijver, Emeline Perthame

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The performances of the proposed estimators and prediction regions are also analyzed through a simulation study and compared with usual estimators. (Abstract) and The ﬁnite-sample performance of the proposed conﬁdence and prediction regions are investigated in Section 5, which also includes a comparison with existing methods namely least squares and Lasso. (Section 1, last paragraph) and Section 5 is titled Simulations.
Researcher Affiliation	Academia	Emilie Devijver EMAIL Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, 38000 Grenoble, France, Emeline Perthame EMAIL Hub de Bioinformatique et Biostatistique D epartement Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
Pseudocode	No	The paper describes mathematical models, theorems, and estimation procedures but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The R code to use the 3 compared methods on simulated data is available at https://research.pasteur.fr/fr/member/emeline-perthame/.
Open Datasets	No	Data are simulated according to an inverse regression model and forward parameters are deduced from Equation (6). (Section 5.1). This indicates simulated data, not a publicly available dataset.
Dataset Splits	Yes	For each simulated design, 1 000 learning datasets with dimension (N, D) are generated as well as 1 000 corresponding testing observations.
Hardware Specification	Yes	computation time (on log scale) required to compute the prediction region on a Mac Book Pro 2,9 GHz Intel Core i5 processor RAM 16 Go with programs written in R.
Software Dependencies	No	The paper mentions using 'glmnet R package' and 'R package matrixcalc' but does not specify version numbers for these packages or for R itself.
Experiment Setup	Yes	The response dimension L is varying in {1, 2, 5}. (Section 5.1); for D = 100, we consider a high-dimensional one with N = 50, an asymptotic one with N = 500 and an intermediate design with N = 100. We also study a design with D = 1000 and N = 100 (Section 5.1); In this simulation study, the level of conﬁdence for prediction regions is set to 95%. (Section 5.1); By repeating this procedure B = 100 times, the distribution of the prediction is estimated. (Section 5.1)