Semi-Supervised Learning with Decision Trees: Graph Laplacian Tree Alternating Optimization

Authors: Arman Zharmagambetov, Miguel A. Carreira-Perpinan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results (section 4) show the algorithm is able to learn accurate and interpretable decision trees even with very few labeled instances. This section shows our experimental findings. We demonstrate that the proposed method dominates over other semi-supervised learning frameworks in accuracy and approaches fully supervised baseline with far less amount of labeled data.
Researcher Affiliation Academia Arman Zharmagambetov Dept. of Computer Science and Engineering University of California, Merced
Pseudocode No We call our algorithm Lap TAO and provide detailed pseudocode in the suppl. mat.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets Yes For instance, in case of 3% in cpu_act and 1% in MNIST, the difference in the error with the second best SSL approach is several orders of magnitude. It shows acceptable results even in extreme label scarcity scenarios, e.g. when we provide < 0.5% of labeled data on year_pred and susy. Therefore, we pick the subset of Fashion-MNIST (3 classes: shirt , bag and ankle boot ) resulting in 18k training points.
Dataset Splits Yes Regarding hyperparameters, given the fixed cross-validation set (1% of train data), we explored as best as we could all important hyperparameters for all methods (see details in the suppl. mat.).
Hardware Specification Yes Please note that we ran our code on a regular PC (Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz, 32GB RAM), with little parallel processing and using unoptimized Python implementation. Therefore, the training runtime for Lap TAO can be significantly improved. We did not use any GPUs.
Software Dependencies No The paper mentions software like LIBLINEAR and LIBSVM but does not provide specific version numbers for these or other key dependencies. It generally states 'unoptimized Python implementation'.
Experiment Setup Yes Regarding hyperparameters, given the fixed cross-validation set (1% of train data), we explored as best as we could all important hyperparameters for all methods (see details in the suppl. mat.). These include: controlling a tree depth ( ), confidence threshold for self-training, σ and C values for Lap SVM, etc. We use γ = 0.1 in all experiments. As for the main loop of the augmented Lagrangian, we iterate 20 times starting from small value for µ0 = 0.001 multiplied by 1.5 after each iteration.