Analyzing Tree Architectures in Ensembles via Neural Tangent Kernel
Authors: Ryuichi Kanoh, Mahito Sugiyama
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally examined the effects of the degeneracy phenomenon discussed in Section 4.2. Setup. We used 90 classification tasks in the UCI database (Dua & Graff, 2017), each of which has fewer than 5000 data points as in (Arora et al., 2020). We performed kernel regression using the limiting NTK defined in Equation 5 and Equation 11, equivalent to the infinite ensemble of the perfect binary trees and decision lists. ... Performance. Figure 8 shows the averaged performance in classification accuracy on 90 datasets. |
| Researcher Affiliation | Academia | Ryuichi Kanoh1,2, Mahito Sugiyama1,2 1National Institute of Informatics 2The Graduate University for Advanced Studies, SOKENDAI |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | REPRODUCIBILITY STATEMENT Proofs are provided in the Appendix. For numerical experiments and figures, reproducible source codes are shared in the supplementary material. |
| Open Datasets | Yes | We used 90 classification tasks in the UCI database (Dua & Graff, 2017) |
| Dataset Splits | Yes | We report four-fold crossvalidation performance with random data splitting as in Arora et al. (2020) and Fern andez-Delgado et al. (2014). |
| Hardware Specification | Yes | We ran all experiments on 2.20 GHz Intel Xeon E5-2698 CPU and 252 GB of memory with Ubuntu Linux (version: 4.15.0-117-generic). |
| Software Dependencies | No | We used scikit-learn2 to perform kernel regression. We used scikit-learn3 for the implementation. (No specific version numbers are provided for these software components). |
| Experiment Setup | Yes | We used D in {2, 4, 8, 16, 32, 64, 128} and α in {1.0, 2.0, 4.0, 8.0, 16.0, 32.0}. The scaled error function is used as a decision function. To consider the ridge-less situation, regularization strength is fixed to 1.0 10 8. As for hyperparameters, we used max_depth in {2, 4, 6}, subsample in {0.6, 0.8, 1.0}, learning_rate in {0.1, 0.01, 0.001}, and n_estimators (the number of trees) in {100, 300, 500}. |