Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multivariate Distributionally Robust Convex Regression under Absolute Error Loss

Authors: Jose Blanchet, Peter W. Glynn, Jun Yan, Zhengqing Zhou

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3 Numerical Experiments In this section we investigate the performance of our estimator bfn,δ, and compare it with the least squares estimator (LSE) of convex regression in [15], as well as the kernel smoothing estimator. We summarize the results in the above table. It is clear that our method outperforms both LSE and LR.
Researcher Affiliation	Academia	Jose Blanchet Stanford MS&E EMAIL Peter W. Glynn Stanford MS&E EMAIL Jun Yan Stanford Statistics EMAIL Zhengqing Zhou Stanford Mathematics EMAIL
Pseudocode	No	The paper describes the construction of the DRCR estimator and its associated linear program formulation, but it does not present this as a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statement about releasing open-source code for the described methodology or a link to a code repository.
Open Datasets	Yes	We consider a public dataset from United States Environmental Protection Agency, which was suggested by [17].
Dataset Splits	No	The paper states 'we randomly split the dataset into a training set with 400 data and a test set with 200 data' for the real dataset, but does not explicitly provide details for a validation set split. For synthetic datasets, it describes generating i.i.d. samples but not a train/test/validation split.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., CPU, GPU models, or cloud computing instance types) used for running its experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, specific solvers) used in the experiments.
Experiment Setup	Yes	We construct our DRCR estimator bfn,δn by taking δn = n 2/d. For the LSE of convex regression, in line with the setting in [3, 15], let c be any numerical constant greater than f , and we consider the class of functions Fc := {f : f is convex, f c}. Given that f = 1, we set c = 10 or 0.8... For some bandwidth hn > 0, we deﬁne the kernel regression estimator bkn,hn by bkn,hn(x) = Pn i=1 Yi K( x Xi / hn )/ Pn i=1 K( x Xi / hn ), where K : Rd R denotes the Gaussian kernel with K(x) = (2π) d/2 e x 2/2. We then choose the best bandwidth hn via cross validation. To be speciﬁc, we pick hn = Cn 1/(d+4) , and then optimize the choice C via line search... In the experiments, we set d = 5, n {50, 100, 150, 200, 250, 300, 350} and σ = 0.2.