Rule-Enhanced Penalized Regression by Column Generation using Rectangular Maximum Agreement

Authors: Jonathan Eckstein, Noam Goldberg, Ai Kagawa

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For preliminary testing of REPR, we selected 8 datasets from the UCI repository (Lichman, 2013), choosing small datasets with continuous response variables. The first four columns of Table 1 summarize the number of observations m, the number of attributes n, and the maximum number of distinguishable box-based rules |K0(X)| for these data sets.
Researcher Affiliation Academia 1Management Science and Information Systems, Rutgers University, Piscataway, NJ, USA 2Department of Management, Bar-Ilan University, Ramat Gan, Israel 3Doctoral Program in Operations Research, Rutgers University, Piscataway, NJ, USA.
Pseudocode Yes Algorithm 1 Preprocessing discretization algorithm
Open Source Code No The paper mentions using PEBBL, an 'open-source C++ framework,' but does not state that the code for the REPR methodology itself is open-source or provide a link to it.
Open Datasets Yes For preliminary testing of REPR, we selected 8 datasets from the UCI repository (Lichman, 2013), choosing small datasets with continuous response variables.
Dataset Splits No The paper states 'each partition consists of 80% training data and 20% testing data,' but does not explicitly mention a separate validation split.
Hardware Specification Yes The last two columns of Table 1 show, for a 16-core Xeon E5-2660 workstation, REPR s average total run time per data partition and the average number of search node per invocation of RMA.
Software Dependencies Yes We implemented the algorithm in C++, using the Gu Ro Bi commercial optimizer (Gurobi Optimization, 2016) to solve the restricted master problems. We implemented the RMA algorithm using using the PEBBL C++ class library (Eckstein et al., 2015), an open-source C++ framework for parallel branch and bound.
Experiment Setup Yes In our initial testing, we focused on the p = 2 case in which fitting errors are penalized quadratically, and set t = 1, that is, we added one model rule per REPR iteration. We set the iteration limit S to 100 and effectively set the termination tolerance θ so that REPR terminated when z1 max 0, E |E[y]| 0.1σ[y] + 0.001, where E[y] denotes the sample mean of the response variable and σ[y] its sample standard deviation. We also chose C = 1 and E = 1. We used δ = 0 for SERVO, YACHT, and MPG, and δ = 0.005 for the remaining datasets.