Inter-node Hellinger Distance based Decision Tree

Authors: Pritom Saha Akash, Md. Eusha Kadir, Amin Ahsan Ali, Mohammad Shoyaib

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform an experiment over twenty balanced and twenty imbalanced datasets. The results show that decision trees based on i HD win against six other state-of-the-art methods on at least 14 balanced and 10 imbalanced datasets. We also observe that adding the weight to i HD improves the performance of decision trees on imbalanced datasets. Moreover, according to the result of the Friedman test, this improvement is statistically significant compared to other methods.
Researcher Affiliation Academia Institute of Information Technology, University of Dhaka, Bangladesh 2Department of Computer Science & Engineering, Independent University, Bangladesh
Pseudocode Yes Algorithm 1 outlines the procedure of learning a binary DT using the proposed split criterion i HDw.
Open Source Code No The paper does not provide concrete access to source code (e.g., specific repository link, explicit code release statement) for the methodology described.
Open Datasets Yes Table 2 shows 40 datasets chosen from various areas like biology, medicine and finance. [...] These datasets are collected from two well-known public sources called UCI Machine Learning Repository [Dua and Graff, 2017] and KEEL Imbalanced Data Sets [Alcal a-Fdez et al., 2011].
Dataset Splits Yes We conduct 10-fold cross-validation on each dataset to get the unbiased result.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For each dataset, we build eight unpruned DT classifiers based on i HD, i HDw, information gain (using both Entropy and Gini), Gain Ratio (GR) and, the splitting criteria proposed in DCSM, HDDT and CCPDT respectively. [...] We conduct 10-fold cross-validation on each dataset to get the unbiased result.