reproducibilityindex.ai

Decision Trees for Decision-Making under the Predict-then-Optimize Framework

Authors: Adam N. Elmachtoub, Jason Cheuk Nam Liang, Ryan Mcnellis

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct several numerical experiments on synthetic and real data including the prediction of travel times for shortest path problems and predicting click probabilities for news article recommendation. We demonstrate on these datasets that SPOTs simultaneously provide higher quality decisions and signiﬁcantly lower model complexity than other machine learning approaches (e.g., CART).
Researcher Affiliation	Collaboration	1Department of Industrial Engineering and Operations Research and Data Science Institute, Columbia University, NY, USA 2Operations Research Center, Massachusetts Institute of Technology, MA, USA 3Amazon, NY, USA. Publication written prior to Amazon employment.
Pseudocode	No	The paper describes the algorithms (e.g., recursive partitioning and integer programming approaches) in prose but does not present them in a structured pseudocode block or explicitly labeled algorithm section.
Open Source Code	Yes	Implementations of our algorithms and experiments may be found at https://github.com/rtm2130/SPOTree.
Open Datasets	Yes	In particular, we consider a news article recommendation problem constructed from the publicly-available Yahoo! Front Page Today Module dataset (Yahoo! Webscope, 2009). URL http://research.yahoo.com/Academic_ Relations. Last accessed 1 Oct 2019.
Dataset Splits	Yes	We used records from May 1-5 for training data and from May 6-10 as test data; 50% of the training set records were additionally held out to construct a validation set for parameter tuning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions commercial optimization solvers such as Gurobi and CPLEX but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	All trees and forests are trained using a minimum leaf size of 20 observations. To prevent overﬁtting, SPOTs and CART trees are pruned on a validation set consisting of 20% of the training data using the pruning algorithm from Breiman et al. (1984). The forest algorithms are trained using B = 100 trees with no depth limit, and the number of features f {2, 3, 4, 5} to use in feature bagging is tuned using the validation set above.