Provable guarantees for decision tree induction: the agnostic setting
Authors: Guy Blanc, Jane Lange, Li-Yang Tan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We give strengthened provable guarantees on the performance of widely employed and empirically successful top-down decision tree learning heuristics. While prior works have focused on the realizable setting, we consider the more realistic and challenging agnostic setting. We show that for all monotone functions f and s N, these heuristics O ((log s)/ε2 construct a decision tree of size s ) that achieves error opt + ε, where opt denotes s s the error of the optimal size-s decision tree for f. Previously such a guarantee was not known to be achievable by any algorithm, even one that is not based on top-down heuristics. We complement our algorithmic guarantee with a near-matching sΩ(log s) lower bound. |
| Researcher Affiliation | Academia | 1Stanford University. Corre spondence to: Guy Blanc <gblanc@stanford.edu>, Jane Lange <jlange20@stanford.edu>, Li-Yang Tan <liyang@cs.stanford.edu>. |
| Pseudocode | Yes | BUILDTOPDOWNDTG ,D(f, t): Initialize T to be the empty tree. while (size(T ) < t) { Grow T by splitting leaf with a query to 1[xi θ], where and 1[xi θ] maximize: G -impurityf,D(T ) G -impurityf,D(T ,1[xi θ]), the purity gain with respect to G and D. Output the f-completion of T .} |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code release related to the methodology described. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on specific datasets, thus no dataset access information is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical evaluation with data splits. |
| Hardware Specification | No | The paper is theoretical and does not report on experimental hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings. |