Sparse Learning with CART
Authors: Jason Klusowski
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Fig. 1a and Fig. 1c, we generate 1000 samples from the model Y = Pd0 j=1 gj(Xj), where each gj(Xj) equals X2 j (alternating signs) and X Uniform([0, 1]d). In Fig. 1a, we plot the test error, averaged over 10 independent replications, of pruned CART vs. k-NN (with cross-validated k) as d ranges from 5 to 100 with d0 = 5 fixed. A similar experiment is performed in Fig. 1b on the Boston housing dataset [4, Section 8.2] (d0 = 10 and n = 506), where we scale the inputs to be in [0, 1] and add d d0 noisy Uniform([0, 1]) input variables. |
| Researcher Affiliation | Academia | Jason M. Klusowski Department of Operations Research & Financial Engineering Princeton University Princeton, New Jersey 08544 jason.klusowski@princeton.edu |
| Pseudocode | No | The paper describes algorithms and procedures in prose, such as 'The CART algorithm is comprised of two elements a growing procedure and a pruning procedure.' However, it does not include any formally structured pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to a code repository. |
| Open Datasets | Yes | A similar experiment is performed in Fig. 1b on the Boston housing dataset [4, Section 8.2] (d0 = 10 and n = 506), where we scale the inputs to be in [0, 1] and add d d0 noisy Uniform([0, 1]) input variables. [4] refers to: Leo Breiman, Jerome Friedman, RA Olshen, and Charles J Stone. Classification and regression trees. Chapman and Hall/CRC, 1984. |
| Dataset Splits | No | In Fig. 1a, we plot the test error, averaged over 10 independent replications, of pruned CART vs. k-NN (with cross-validated k)... While 'cross-validated k' implies a validation process, the paper does not specify the explicit splits (e.g., percentages or counts) for training, validation, and test sets to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to conduct the experiments, such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper mentions general techniques and algorithms like 'bagging [2] and random forests [3]' or 'Tree Boost [7]' but does not list specific software libraries or packages with their version numbers that were used for implementation. |
| Experiment Setup | No | The paper describes the synthetic data generation and the use of the Boston housing dataset, mentioning 'd ranges from 5 to 100' and 'd0 = 5 fixed' or 'd0 = 10 and n = 506'. It also mentions 'k-NN (with cross-validated k)'. However, it lacks specific hyperparameters or detailed configuration settings for the CART or k-NN models themselves (e.g., learning rates, max_depth for CART, specific range of k for k-NN, etc.) that would allow for full reproducibility of the experimental setup. |