Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Convex Regression with Interpretable Sharp Partitions

Authors: Ashley Petersen, Noah Simon, Daniela Witten

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set.
Researcher Affiliation	Academia	Department of Biostatistics University of Washington Seattle, WA 98195
Pseudocode	Yes	Algorithm 1 Alternating Directions Method of Multipliers for Equation (4) ... Algorithm 2 Block Coordinate Descent for CRISP with p > 2 (Equation (13))
Open Source Code	No	The paper mentions 'Our Python implementation of CRISP' and 'FLAM (implemented with the R package flam (Petersen, 2014)); CART (implemented with the R package rpart (Therneau et al., 2014)); TPS (implemented with the R package fields (Nychka et al., 2014))', but it does not provide an explicit link or statement about the open-sourcing of the code for the CRISP methodology itself. The R packages mentioned are for third-party or related methods, not the direct implementation of CRISP.
Open Datasets	Yes	The data set was originally considered in Pace and Barry (1997) and is publicly available from the Carnegie Mellon Stat Lib data repository (lib.stat.cmu.edu).
Dataset Splits	Yes	We consider ﬁve diﬀerent training set sizes: 100, 500, 1000, 5000, and 11,198 (which corresponds to 60% of the observations). We use the observations not selected for the training set as the test set.
Hardware Specification	Yes	On a Macbook Pro with a 2.0 GHz Intel Sandy Bridge Core i7 processor, our Python implementation of CRISP with n = q = 50 takes 20.1 seconds for a sequence of 20 λ values.
Software Dependencies	Yes	FLAM (implemented with the R package flam (Petersen, 2014)); CART (implemented with the R package rpart (Therneau et al., 2014)); TPS (implemented with the R package fields (Nychka et al., 2014))... R package version 1.0 for flam, R package version 4.1-8 for rpart, R package version 7.1 for fields.
Experiment Setup	Yes	We generate data with either n = 100 or n = 10, 000, and p = 2. We independently sample each element of x1 and x2 from a Unif[−2.5, 2.5] distribution, and then take y = f(x1, x2) + ϵ, where ϵ ∼ MVN(0, σ2In) with σ = 1 for n = 100 and σ = 10 for n = 10, 000. ... For each scenario, we generate 200 data sets and estimate M using CRISP (with q = 100) and several competitors.