reproducibilityindex.ai

Rule-Enhanced Penalized Regression by Column Generation using Rectangular Maximum Agreement

Authors: Jonathan Eckstein, Noam Goldberg, Ai Kagawa

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For preliminary testing of REPR, we selected 8 datasets from the UCI repository (Lichman, 2013), choosing small datasets with continuous response variables. The ﬁrst four columns of Table 1 summarize the number of observations m, the number of attributes n, and the maximum number of distinguishable box-based rules \|K0(X)\| for these data sets.
Researcher Affiliation	Academia	1Management Science and Information Systems, Rutgers University, Piscataway, NJ, USA 2Department of Management, Bar-Ilan University, Ramat Gan, Israel 3Doctoral Program in Operations Research, Rutgers University, Piscataway, NJ, USA.
Pseudocode	Yes	Algorithm 1 Preprocessing discretization algorithm
Open Source Code	No	The paper mentions using PEBBL, an 'open-source C++ framework,' but does not state that the code for the REPR methodology itself is open-source or provide a link to it.
Open Datasets	Yes	For preliminary testing of REPR, we selected 8 datasets from the UCI repository (Lichman, 2013), choosing small datasets with continuous response variables.
Dataset Splits	No	The paper states 'each partition consists of 80% training data and 20% testing data,' but does not explicitly mention a separate validation split.
Hardware Specification	Yes	The last two columns of Table 1 show, for a 16-core Xeon E5-2660 workstation, REPR s average total run time per data partition and the average number of search node per invocation of RMA.
Software Dependencies	Yes	We implemented the algorithm in C++, using the Gu Ro Bi commercial optimizer (Gurobi Optimization, 2016) to solve the restricted master problems. We implemented the RMA algorithm using using the PEBBL C++ class library (Eckstein et al., 2015), an open-source C++ framework for parallel branch and bound.
Experiment Setup	Yes	In our initial testing, we focused on the p = 2 case in which ﬁtting errors are penalized quadratically, and set t = 1, that is, we added one model rule per REPR iteration. We set the iteration limit S to 100 and effectively set the termination tolerance θ so that REPR terminated when z1 max 0, E \|E[y]\| 0.1σ[y] + 0.001, where E[y] denotes the sample mean of the response variable and σ[y] its sample standard deviation. We also chose C = 1 and E = 1. We used δ = 0 for SERVO, YACHT, and MPG, and δ = 0.005 for the remaining datasets.