Sparse Perceptron Decision Tree for Millions of Dimensions
Authors: Weiwei Liu, Ivor Tsang
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical studies verify that our SPDT is more resilient to noisy features and effectively generates a small, yet accurate decision tree. Compared with state-of-the-art DT methods and SVM, our SPDT achieves better generalization performance on ultrahigh dimensional problems with more than 1 million features. |
| Researcher Affiliation | Academia | Weiwei Liu and Ivor W. Tsang Centre for Quantum Computation and Intelligent Systems University of Technology Sydney, Australia liuweiwei863@gmail.com, ivor.tsang@uts.edu.au |
| Pseudocode | Yes | Algorithm 1 Sparse Perceptron Decision Tree (SPDT) |
| Open Source Code | No | The paper mentions modifying the FGM software available at http://www.tanmingkui.com/fgm.html, but does not provide a statement or link for the open-sourcing of their own SPDT methodology. |
| Open Datasets | Yes | Most data sets are collected from this website1. pcmac data set is from (Xu et al. 2014). 1http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/ |
| Dataset Splits | Yes | We use 5-fold cross validation to prune SPDT. ... C is selected using 5-fold cross validation over the range {0.001, 0.01, 0.1, 5, 10} for the first three data sets and we fix C = 5 for larger data sets like epsilon and rcv. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions software like LIBLINEAR and notes that some methods are implemented in C++ or Matlab, but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Following the parameter settings in (Tan, Tsang, and Wang 2014), B is chosen in a range of {2, 5, 10, 20, 50, 100, 150, 200, 250} for the rcv data set and {0.01m, 0.02m, , 0.09m} for other data sets; C is selected using 5-fold cross validation over the range {0.001, 0.01, 0.1, 5, 10} for the first three data sets and we fix C = 5 for larger data sets like epsilon and rcv. The tree-depth is fixed to 3 in LDKL, following the settings in (Oiwa and Fujimaki 2014). |