Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Non-parametric Quantile Regression via the K-NN Fused Lasso

Authors: Steven Siwei Ye, Oscar Hernan Madrid Padilla

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments on simulated and real data demonstrate clear advantages of the proposed estimator over state of the art methods. All codes that implement the algorithms and the datasets used in the experiments are publicly available on the author s Github page (https://github.com/stevenysw/qt_knnfl). In Section 5, we list the results of numerical experiments on multiple simulated datasets and two real datasets, California housing data and Chicago crime data. The experiments show that the proposed estimator outperform state-of-the-art methods on both simulated and real datasets.
Researcher Affiliation	Academia	Steven Siwei Ye EMAIL Department of Statistics University of California, Los Angeles Los Angeles, CA 90095, USA Oscar Hernan Madrid Padilla EMAIL Department of Statistics University of California, Los Angeles Los Angeles, CA 90095, USA
Pseudocode	Yes	Algorithm 1: Alternating Directions Method of Multipliers for quantile K-NN fused lasso ... Algorithm 2: Majorize-Minimize for quantile K-NN fused lasso, τ = 0.5
Open Source Code	Yes	All codes that implement the algorithms and the datasets used in the experiments are publicly available on the author s Github page (https://github.com/stevenysw/qt_knnfl).
Open Datasets	Yes	Numerical experiments on simulated and real data... 5.2.1 California Housing Data... is publicly available from the Carnegie Mellon Stat Lib data repository (lib.stat.cmu.edu). 5.2.2 Chicago Crime Data... a dataset of publicly-available crime report counts in Chicago, Illinois in 2015.
Dataset Splits	Yes	5.2.1 California Housing Data: We perform 100 train-test random splits the data, with training sizes 1000, 5000, and 10000. For each split the data not in the training set is treated as testing data. ... 5.2.2 Chicago Crime Data: ... we perform a train-test split with training size 500, 1000, 1500, and 2000
Hardware Specification	No	The paper does not explicitly describe the hardware used for its experiments. It discusses computational time but does not mention specific CPU, GPU models, or memory.
Software Dependencies	No	For quantile random forest, we directly use the R package quantregForest with defaulted choice of tree structure and tuning parameters. This mentions a software package but does not provide a specific version number, which is required for a reproducible description.
Experiment Setup	Yes	For quantile K-NN fused lasso, we use the ADMM algorithm and select the tuning parameter λ based on the BIC criteria described in Section 3.3... For quantile random forest, we directly use the R package quantregForest with defaulted choice of tree structure and tuning parameters. Throughout, for both K-NN fused lasso and quantile K-NN fused lasso, we set K to be 5 for suﬃcient information and eﬃcient computation.