Efficient Online Decision Tree Learning with Active Feature Acquisition

Authors: Arman Rahbar, Ziyu Ye, Yuxin Chen, Morteza Haghir Chehreghani

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficiency and effectiveness of our framework via extensive experiments on various real-world datasets. Our framework also naturally adapts to the challenging setting of online learning with concept drift and is shown to be competitive with baseline models while being more flexible.
Researcher Affiliation Academia 1Chalmers University of Technology 2University of Chicago
Pseudocode Yes Algorithm 1 Online Decision Tree Learning Algorithm 2 Posterior Update Algorithm 3 Planning by Surrogate Optimization Algorithm 4 Hypotheses Sampling Algorithm 5 Threshold selection Algorithm 6 Handling Concept Drift
Open Source Code Yes Source code of UFODT is available here on Git Hub.
Open Datasets Yes Datasets. We have used three stationary datasets in our experiments that are standard binary classification datasets taken from UCI repository [Dua and Graff, 2017]. Furthermore, we conduct experiments on the Pro Publica recidivism (Compas) dataset [Larson et al., 2016] and the Fair Isaac (Fico) credit risk dataset [FICO et al., 2018] as in [Hu et al., 2019]. For concept drifting experiments, we adopt the non-stationary Stagger dataset [Widmer and Kubat, 1996; L opez Lobo, 2020], where each data has three nominal attributes and the target concept will change abruptly at some point. For extensions to continuous features (as well as for feature selection in the appendix), we use Prima Indians Diabetes Dataset [Smith et al., 1988], Breast Cancer Wisconsin Dataset [Street et al., 1999] and Fetal Health Dataset [Ayres-de Campos et al., 2000].
Dataset Splits No The paper mentions "holdout test sets" but does not specify the train/validation/test split percentages or sample counts for any of the datasets used. It only states the datasets are standard or provides citations without detailing the splits.
Hardware Specification No The paper does not explicitly provide details about the hardware specifications (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes We use η = 0.01 in Algorithm 5.