Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Quantile regression with ReLU Networks: Estimators and minimax rates

Authors: Oscar Hernan Madrid Padilla, Wesley Tansey, Yanzhen Chen

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical simulations on a suite of synthetic response functions demonstrate the theoretical results translate to practical implementations of Re LU networks. Overall, the theoretical and empirical results provide insight into the strong performance of Re LU neural networks for quantile regression across a broad range of function classes and error distributions. We study the performance of Re LU networks for quantile regression across a suite of heavytailed synthetic and real-data benchmarks. We assess the performance of all methods using the mean squared error (MSE) between the estimated and true quantile functions.
Researcher Affiliation	Academia	Oscar Hernan Madrid Padilla EMAIL Department of Statistics Univeristy of California, Los Angeles 520 Portola Plaza, Los Angeles, California, USA; Wesley Tansey EMAIL Computational Oncology Department of Epidemiology and Biostatistics 1275 York Avenue New York, NY 10065; Yanzhen Chen EMAIL Department of Information Systems, Business Statistics and Operations Management Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong
Pseudocode	No	The paper describes mathematical models and theoretical derivations, but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured, code-like steps for any method.
Open Source Code	Yes	All code for this paper is publicly available at https://github.com/tansey/quantile-regression.
Open Datasets	Yes	Empirical simulations on a suite of synthetic response functions demonstrate the theoretical results translate to practical implementations of Re LU networks. In each scenario the data are generated following the same location-plus-noise template, yi = f0(xi) + ϵi, i = 1, . . . , n, xi ind [0, 1]d, where ϵi Gi for a distribution Gi in R, and with f0 : [0, 1]d R for a choice of d that is scenario dependent. We consider 5 diﬀerent scenarios following this template: [descriptions of Scenario 1-5 follow].
Dataset Splits	No	In each experiment, the methods are estimated at diﬀerent training sample sizes n, n {100, 1000, 10000}, and diﬀerent quantile levels τ, τ {0.05, 0.25, 0.50, 0.75, 0.95}. For each benchmark, we generate 25 datasets independently from the same generative model and evaluate performance using 10000 sampled covariates with the corresponding true quantile. The paper describes generating independent datasets for training and evaluating on sampled covariates, but it does not specify explicit train/validation/test splits from a single fixed dataset.
Hardware Specification	No	For the other two nonparametric methods, we choose parameters to be ﬂexible enough to capture a large number of nonlinearities while still computationally feasible on a laptop for moderate-sized problems. The paper mentions experiments were run on a 'laptop' but does not provide specific hardware details like CPU or GPU models, or memory specifications.
Software Dependencies	No	For the two neural network methods, we train the models using stochastic gradient descent (SGD) as implemented in Py Torch (Paszke et al., 2019) with Nesterov momentum of 0.9, starting learning rate of 0.1, and stepwise decay 0.5. For Quantile Splines, we use a natural spline basis with 3 degrees of freedom; we use the implementation available in the statsmodels package. For Regression Forests, we use 100 tree estimators and a minimum sample count for splits of 10; these are defaults in the scikit-garden package. While software packages like PyTorch, statsmodels, and scikit-garden are mentioned, no specific version numbers are provided for these dependencies.
Experiment Setup	Yes	For the two neural network methods, we train the models using stochastic gradient descent (SGD) as implemented in Py Torch (Paszke et al., 2019) with Nesterov momentum of 0.9, starting learning rate of 0.1, and stepwise decay 0.5. The neural network models also use the same architecture: two hidden layers of 200 units each, with dropout rate of 0.1 and batch normalization in each layer. For Quantile Splines, we use a natural spline basis with 3 degrees of freedom; we use the implementation available in the statsmodels package. For Regression Forests, we use 100 tree estimators and a minimum sample count for splits of 10; these are defaults in the scikit-garden package.