Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Rule-Enhanced Penalized Regression by Column Generation using Rectangular Maximum Agreement
Authors: Jonathan Eckstein, Noam Goldberg, Ai Kagawa
ICML 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For preliminary testing of REPR, we selected 8 datasets from the UCI repository (Lichman, 2013), choosing small datasets with continuous response variables. The first four columns of Table 1 summarize the number of observations m, the number of attributes n, and the maximum number of distinguishable box-based rules |K0(X)| for these data sets. |
| Researcher Affiliation | Academia | 1Management Science and Information Systems, Rutgers University, Piscataway, NJ, USA 2Department of Management, Bar-Ilan University, Ramat Gan, Israel 3Doctoral Program in Operations Research, Rutgers University, Piscataway, NJ, USA. |
| Pseudocode | Yes | Algorithm 1 Preprocessing discretization algorithm |
| Open Source Code | No | The paper mentions using PEBBL, an 'open-source C++ framework,' but does not state that the code for the REPR methodology itself is open-source or provide a link to it. |
| Open Datasets | Yes | For preliminary testing of REPR, we selected 8 datasets from the UCI repository (Lichman, 2013), choosing small datasets with continuous response variables. |
| Dataset Splits | No | The paper states 'each partition consists of 80% training data and 20% testing data,' but does not explicitly mention a separate validation split. |
| Hardware Specification | Yes | The last two columns of Table 1 show, for a 16-core Xeon E5-2660 workstation, REPR s average total run time per data partition and the average number of search node per invocation of RMA. |
| Software Dependencies | Yes | We implemented the algorithm in C++, using the Gu Ro Bi commercial optimizer (Gurobi Optimization, 2016) to solve the restricted master problems. We implemented the RMA algorithm using using the PEBBL C++ class library (Eckstein et al., 2015), an open-source C++ framework for parallel branch and bound. |
| Experiment Setup | Yes | In our initial testing, we focused on the p = 2 case in which fitting errors are penalized quadratically, and set t = 1, that is, we added one model rule per REPR iteration. We set the iteration limit S to 100 and effectively set the termination tolerance θ so that REPR terminated when z1 max 0, E |E[y]| 0.1σ[y] + 0.001, where E[y] denotes the sample mean of the response variable and σ[y] its sample standard deviation. We also chose C = 1 and E = 1. We used δ = 0 for SERVO, YACHT, and MPG, and δ = 0.005 for the remaining datasets. |