Semi-Supervised Learning with Decision Trees: Graph Laplacian Tree Alternating Optimization
Authors: Arman Zharmagambetov, Miguel A. Carreira-Perpinan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results (section 4) show the algorithm is able to learn accurate and interpretable decision trees even with very few labeled instances. This section shows our experimental findings. We demonstrate that the proposed method dominates over other semi-supervised learning frameworks in accuracy and approaches fully supervised baseline with far less amount of labeled data. |
| Researcher Affiliation | Academia | Arman Zharmagambetov Dept. of Computer Science and Engineering University of California, Merced |
| Pseudocode | No | We call our algorithm Lap TAO and provide detailed pseudocode in the suppl. mat. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | For instance, in case of 3% in cpu_act and 1% in MNIST, the difference in the error with the second best SSL approach is several orders of magnitude. It shows acceptable results even in extreme label scarcity scenarios, e.g. when we provide < 0.5% of labeled data on year_pred and susy. Therefore, we pick the subset of Fashion-MNIST (3 classes: shirt , bag and ankle boot ) resulting in 18k training points. |
| Dataset Splits | Yes | Regarding hyperparameters, given the fixed cross-validation set (1% of train data), we explored as best as we could all important hyperparameters for all methods (see details in the suppl. mat.). |
| Hardware Specification | Yes | Please note that we ran our code on a regular PC (Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz, 32GB RAM), with little parallel processing and using unoptimized Python implementation. Therefore, the training runtime for Lap TAO can be significantly improved. We did not use any GPUs. |
| Software Dependencies | No | The paper mentions software like LIBLINEAR and LIBSVM but does not provide specific version numbers for these or other key dependencies. It generally states 'unoptimized Python implementation'. |
| Experiment Setup | Yes | Regarding hyperparameters, given the fixed cross-validation set (1% of train data), we explored as best as we could all important hyperparameters for all methods (see details in the suppl. mat.). These include: controlling a tree depth ( ), confidence threshold for self-training, σ and C values for Lap SVM, etc. We use γ = 0.1 in all experiments. As for the main loop of the augmented Lagrangian, we iterate 20 times starting from small value for µ0 = 0.001 multiplied by 1.5 after each iteration. |