Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Convex Regression with Interpretable Sharp Partitions
Authors: Ashley Petersen, Noah Simon, Daniela Witten
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set. |
| Researcher Affiliation | Academia | Department of Biostatistics University of Washington Seattle, WA 98195 |
| Pseudocode | Yes | Algorithm 1 Alternating Directions Method of Multipliers for Equation (4) ... Algorithm 2 Block Coordinate Descent for CRISP with p > 2 (Equation (13)) |
| Open Source Code | No | The paper mentions 'Our Python implementation of CRISP' and 'FLAM (implemented with the R package flam (Petersen, 2014)); CART (implemented with the R package rpart (Therneau et al., 2014)); TPS (implemented with the R package fields (Nychka et al., 2014))', but it does not provide an explicit link or statement about the open-sourcing of the code for the CRISP methodology itself. The R packages mentioned are for third-party or related methods, not the direct implementation of CRISP. |
| Open Datasets | Yes | The data set was originally considered in Pace and Barry (1997) and is publicly available from the Carnegie Mellon Stat Lib data repository (lib.stat.cmu.edu). |
| Dataset Splits | Yes | We consider five different training set sizes: 100, 500, 1000, 5000, and 11,198 (which corresponds to 60% of the observations). We use the observations not selected for the training set as the test set. |
| Hardware Specification | Yes | On a Macbook Pro with a 2.0 GHz Intel Sandy Bridge Core i7 processor, our Python implementation of CRISP with n = q = 50 takes 20.1 seconds for a sequence of 20 λ values. |
| Software Dependencies | Yes | FLAM (implemented with the R package flam (Petersen, 2014)); CART (implemented with the R package rpart (Therneau et al., 2014)); TPS (implemented with the R package fields (Nychka et al., 2014))... R package version 1.0 for flam, R package version 4.1-8 for rpart, R package version 7.1 for fields. |
| Experiment Setup | Yes | We generate data with either n = 100 or n = 10, 000, and p = 2. We independently sample each element of x1 and x2 from a Unif[−2.5, 2.5] distribution, and then take y = f(x1, x2) + ϵ, where ϵ ∼ MVN(0, σ2In) with σ = 1 for n = 100 and σ = 10 for n = 10, 000. ... For each scenario, we generate 200 data sets and estimate M using CRISP (with q = 100) and several competitors. |