Sparse Perceptron Decision Tree for Millions of Dimensions

Authors: Weiwei Liu, Ivor Tsang

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies verify that our SPDT is more resilient to noisy features and effectively generates a small, yet accurate decision tree. Compared with state-of-the-art DT methods and SVM, our SPDT achieves better generalization performance on ultrahigh dimensional problems with more than 1 million features.
Researcher Affiliation Academia Weiwei Liu and Ivor W. Tsang Centre for Quantum Computation and Intelligent Systems University of Technology Sydney, Australia liuweiwei863@gmail.com, ivor.tsang@uts.edu.au
Pseudocode Yes Algorithm 1 Sparse Perceptron Decision Tree (SPDT)
Open Source Code No The paper mentions modifying the FGM software available at http://www.tanmingkui.com/fgm.html, but does not provide a statement or link for the open-sourcing of their own SPDT methodology.
Open Datasets Yes Most data sets are collected from this website1. pcmac data set is from (Xu et al. 2014). 1http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/
Dataset Splits Yes We use 5-fold cross validation to prune SPDT. ... C is selected using 5-fold cross validation over the range {0.001, 0.01, 0.1, 5, 10} for the first three data sets and we fix C = 5 for larger data sets like epsilon and rcv.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software like LIBLINEAR and notes that some methods are implemented in C++ or Matlab, but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Following the parameter settings in (Tan, Tsang, and Wang 2014), B is chosen in a range of {2, 5, 10, 20, 50, 100, 150, 200, 250} for the rcv data set and {0.01m, 0.02m, , 0.09m} for other data sets; C is selected using 5-fold cross validation over the range {0.001, 0.01, 0.1, 5, 10} for the first three data sets and we fix C = 5 for larger data sets like epsilon and rcv. The tree-depth is fixed to 3 in LDKL, following the settings in (Oiwa and Fujimaki 2014).