Smooth And Consistent Probabilistic Regression Trees
Authors: Sami Alkhoury, Emilie Devijver, Marianne Clausel, Myriam Tami, Eric Gaussier, georges Oppenheim
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we assess their performance through extensive experiments that illustrate their benefits in terms of performance, interpretability and robustness to noise. |
| Researcher Affiliation | Academia | Sami Alkhoury Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France sami.alkhoury@univ-grenoble-alpes.fr Emilie Devijver Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France emilie.devijver@univ-grenoble-alpes.fr Marianne Clausel Université de Lorraine, CNRS, IECL, Nancy, France marianne.clausel@univ-lorraine.fr Myriam Tami Univ. Paris-Saclay, Centrale Supélec, MICS, Gif-sur-Yvette, France myriam.tami@centralesupelec.fr Eric Gaussier Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France eric.gaussier@imag.fr Georges Oppenheim Univ. Paris-Est-Marne la Vallée, Département de Mathématiques, Marne-la-Vallée, France georges.oppenheim@gmail.com |
| Pseudocode | No | The paper describes algorithms and procedures in prose and mathematical notation but does not include formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | PR trees are built on top of this implementation, and the code for this implementation is available in https://gitlab.com/sami.kh/pr-tree. |
| Open Datasets | Yes | We make use here of 13 data sets of various size, namely (ordered by increasing sample size) Riboflavin (RI), Ozone (OZ), Diabetes (DI), Abalone (AB), Boston (BO), Bike-Day (BD), E2006, Skill (SK), Ailerons (AL), Bike-Hour (BH), Super Conductor (SC), Facebook Comments (FC) and Video Transcoding (VT), all commonly used in regression tasks. ... Full details on these datasets (sample size and number of variables, location) are given in the Supplementary Material. |
| Dataset Splits | Yes | Each fold is divided into 80% for train and 20% for test, except for Soft trees and PR trees and their gradient boosted extension (see below) for which each fold is divided into 65% for train, 15% for validation and 20% for test. |
| Hardware Specification | Yes | All experiments were conducted on a 256 GB RAM server with 32 CPUs at 2.60GHz. |
| Software Dependencies | No | The paper mentions using 'Scikit-Learn [29]' for standard regression trees and their ensemble extensions, but it does not specify a version number for this or any other software dependency. |
| Experiment Setup | Yes | Lastly, for both PR and standard regression trees, the stopping criterion is the same in all experiments: all leaves should contain at least 10% of the training data. ... For PR trees and their gradient boosted extension, it is used to estimate the noise vector σ through a grid search taking values, for each variable j, 1 j p, in the interval [0, 2ˆσj] with a step of ˆσj/4, where ˆσj denotes the empirical standard deviation of variable j. ... We use stratified 10-fold cross-validation to evaluate the performance of each method. |