Smooth And Consistent Probabilistic Regression Trees

Authors: Sami Alkhoury, Emilie Devijver, Marianne Clausel, Myriam Tami, Eric Gaussier, georges Oppenheim

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we assess their performance through extensive experiments that illustrate their benefits in terms of performance, interpretability and robustness to noise.
Researcher Affiliation Academia Sami Alkhoury Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France sami.alkhoury@univ-grenoble-alpes.fr Emilie Devijver Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France emilie.devijver@univ-grenoble-alpes.fr Marianne Clausel Université de Lorraine, CNRS, IECL, Nancy, France marianne.clausel@univ-lorraine.fr Myriam Tami Univ. Paris-Saclay, Centrale Supélec, MICS, Gif-sur-Yvette, France myriam.tami@centralesupelec.fr Eric Gaussier Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France eric.gaussier@imag.fr Georges Oppenheim Univ. Paris-Est-Marne la Vallée, Département de Mathématiques, Marne-la-Vallée, France georges.oppenheim@gmail.com
Pseudocode No The paper describes algorithms and procedures in prose and mathematical notation but does not include formal pseudocode blocks or algorithms labeled as such.
Open Source Code Yes PR trees are built on top of this implementation, and the code for this implementation is available in https://gitlab.com/sami.kh/pr-tree.
Open Datasets Yes We make use here of 13 data sets of various size, namely (ordered by increasing sample size) Riboflavin (RI), Ozone (OZ), Diabetes (DI), Abalone (AB), Boston (BO), Bike-Day (BD), E2006, Skill (SK), Ailerons (AL), Bike-Hour (BH), Super Conductor (SC), Facebook Comments (FC) and Video Transcoding (VT), all commonly used in regression tasks. ... Full details on these datasets (sample size and number of variables, location) are given in the Supplementary Material.
Dataset Splits Yes Each fold is divided into 80% for train and 20% for test, except for Soft trees and PR trees and their gradient boosted extension (see below) for which each fold is divided into 65% for train, 15% for validation and 20% for test.
Hardware Specification Yes All experiments were conducted on a 256 GB RAM server with 32 CPUs at 2.60GHz.
Software Dependencies No The paper mentions using 'Scikit-Learn [29]' for standard regression trees and their ensemble extensions, but it does not specify a version number for this or any other software dependency.
Experiment Setup Yes Lastly, for both PR and standard regression trees, the stopping criterion is the same in all experiments: all leaves should contain at least 10% of the training data. ... For PR trees and their gradient boosted extension, it is used to estimate the noise vector σ through a grid search taking values, for each variable j, 1 j p, in the interval [0, 2ˆσj] with a step of ˆσj/4, where ˆσj denotes the empirical standard deviation of variable j. ... We use stratified 10-fold cross-validation to evaluate the performance of each method.