Sparse Learning with CART

Authors: Jason Klusowski

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Fig. 1a and Fig. 1c, we generate 1000 samples from the model Y = Pd0 j=1 gj(Xj), where each gj(Xj) equals X2 j (alternating signs) and X Uniform([0, 1]d). In Fig. 1a, we plot the test error, averaged over 10 independent replications, of pruned CART vs. k-NN (with cross-validated k) as d ranges from 5 to 100 with d0 = 5 fixed. A similar experiment is performed in Fig. 1b on the Boston housing dataset [4, Section 8.2] (d0 = 10 and n = 506), where we scale the inputs to be in [0, 1] and add d d0 noisy Uniform([0, 1]) input variables.
Researcher Affiliation Academia Jason M. Klusowski Department of Operations Research & Financial Engineering Princeton University Princeton, New Jersey 08544 jason.klusowski@princeton.edu
Pseudocode No The paper describes algorithms and procedures in prose, such as 'The CART algorithm is comprised of two elements a growing procedure and a pruning procedure.' However, it does not include any formally structured pseudocode blocks or algorithm listings.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to a code repository.
Open Datasets Yes A similar experiment is performed in Fig. 1b on the Boston housing dataset [4, Section 8.2] (d0 = 10 and n = 506), where we scale the inputs to be in [0, 1] and add d d0 noisy Uniform([0, 1]) input variables. [4] refers to: Leo Breiman, Jerome Friedman, RA Olshen, and Charles J Stone. Classification and regression trees. Chapman and Hall/CRC, 1984.
Dataset Splits No In Fig. 1a, we plot the test error, averaged over 10 independent replications, of pruned CART vs. k-NN (with cross-validated k)... While 'cross-validated k' implies a validation process, the paper does not specify the explicit splits (e.g., percentages or counts) for training, validation, and test sets to reproduce the data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware used to conduct the experiments, such as CPU/GPU models or memory specifications.
Software Dependencies No The paper mentions general techniques and algorithms like 'bagging [2] and random forests [3]' or 'Tree Boost [7]' but does not list specific software libraries or packages with their version numbers that were used for implementation.
Experiment Setup No The paper describes the synthetic data generation and the use of the Boston housing dataset, mentioning 'd ranges from 5 to 100' and 'd0 = 5 fixed' or 'd0 = 10 and n = 506'. It also mentions 'k-NN (with cross-validated k)'. However, it lacks specific hyperparameters or detailed configuration settings for the CART or k-NN models themselves (e.g., learning rates, max_depth for CART, specific range of k for k-NN, etc.) that would allow for full reproducibility of the experimental setup.