reproducibilityindex.ai

Semi-Supervised Learning with Decision Trees: Graph Laplacian Tree Alternating Optimization

Authors: Arman Zharmagambetov, Miguel A. Carreira-Perpinan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results (section 4) show the algorithm is able to learn accurate and interpretable decision trees even with very few labeled instances. This section shows our experimental ﬁndings. We demonstrate that the proposed method dominates over other semi-supervised learning frameworks in accuracy and approaches fully supervised baseline with far less amount of labeled data.
Researcher Affiliation	Academia	Arman Zharmagambetov Dept. of Computer Science and Engineering University of California, Merced
Pseudocode	No	We call our algorithm Lap TAO and provide detailed pseudocode in the suppl. mat.
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets	Yes	For instance, in case of 3% in cpu_act and 1% in MNIST, the difference in the error with the second best SSL approach is several orders of magnitude. It shows acceptable results even in extreme label scarcity scenarios, e.g. when we provide < 0.5% of labeled data on year_pred and susy. Therefore, we pick the subset of Fashion-MNIST (3 classes: shirt , bag and ankle boot ) resulting in 18k training points.
Dataset Splits	Yes	Regarding hyperparameters, given the ﬁxed cross-validation set (1% of train data), we explored as best as we could all important hyperparameters for all methods (see details in the suppl. mat.).
Hardware Specification	Yes	Please note that we ran our code on a regular PC (Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz, 32GB RAM), with little parallel processing and using unoptimized Python implementation. Therefore, the training runtime for Lap TAO can be signiﬁcantly improved. We did not use any GPUs.
Software Dependencies	No	The paper mentions software like LIBLINEAR and LIBSVM but does not provide specific version numbers for these or other key dependencies. It generally states 'unoptimized Python implementation'.
Experiment Setup	Yes	Regarding hyperparameters, given the ﬁxed cross-validation set (1% of train data), we explored as best as we could all important hyperparameters for all methods (see details in the suppl. mat.). These include: controlling a tree depth ( ), conﬁdence threshold for self-training, σ and C values for Lap SVM, etc. We use γ = 0.1 in all experiments. As for the main loop of the augmented Lagrangian, we iterate 20 times starting from small value for µ0 = 0.001 multiplied by 1.5 after each iteration.