Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sparse Perceptron Decision Tree for Millions of Dimensions

Authors: Weiwei Liu, Ivor Tsang

AAAI 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies verify that our SPDT is more resilient to noisy features and effectively generates a small, yet accurate decision tree. Compared with state-of-the-art DT methods and SVM, our SPDT achieves better generalization performance on ultrahigh dimensional problems with more than 1 million features.
Researcher Affiliation Academia Weiwei Liu and Ivor W. Tsang Centre for Quantum Computation and Intelligent Systems University of Technology Sydney, Australia EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Sparse Perceptron Decision Tree (SPDT)
Open Source Code No The paper mentions modifying the FGM software available at http://www.tanmingkui.com/fgm.html, but does not provide a statement or link for the open-sourcing of their own SPDT methodology.
Open Datasets Yes Most data sets are collected from this website1. pcmac data set is from (Xu et al. 2014). 1http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/
Dataset Splits Yes We use 5-fold cross validation to prune SPDT. ... C is selected using 5-fold cross validation over the range {0.001, 0.01, 0.1, 5, 10} for the first three data sets and we fix C = 5 for larger data sets like epsilon and rcv.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software like LIBLINEAR and notes that some methods are implemented in C++ or Matlab, but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Following the parameter settings in (Tan, Tsang, and Wang 2014), B is chosen in a range of {2, 5, 10, 20, 50, 100, 150, 200, 250} for the rcv data set and {0.01m, 0.02m, , 0.09m} for other data sets; C is selected using 5-fold cross validation over the range {0.001, 0.01, 0.1, 5, 10} for the first three data sets and we fix C = 5 for larger data sets like epsilon and rcv. The tree-depth is fixed to 3 in LDKL, following the settings in (Oiwa and Fujimaki 2014).