reproducibilityindex.ai

Smooth And Consistent Probabilistic Regression Trees

Authors: Sami Alkhoury, Emilie Devijver, Marianne Clausel, Myriam Tami, Eric Gaussier, georges Oppenheim

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we assess their performance through extensive experiments that illustrate their beneﬁts in terms of performance, interpretability and robustness to noise.
Researcher Affiliation	Academia	Sami Alkhoury Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France sami.alkhoury@univ-grenoble-alpes.fr Emilie Devijver Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France emilie.devijver@univ-grenoble-alpes.fr Marianne Clausel Université de Lorraine, CNRS, IECL, Nancy, France marianne.clausel@univ-lorraine.fr Myriam Tami Univ. Paris-Saclay, Centrale Supélec, MICS, Gif-sur-Yvette, France myriam.tami@centralesupelec.fr Eric Gaussier Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France eric.gaussier@imag.fr Georges Oppenheim Univ. Paris-Est-Marne la Vallée, Département de Mathématiques, Marne-la-Vallée, France georges.oppenheim@gmail.com
Pseudocode	No	The paper describes algorithms and procedures in prose and mathematical notation but does not include formal pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	PR trees are built on top of this implementation, and the code for this implementation is available in https://gitlab.com/sami.kh/pr-tree.
Open Datasets	Yes	We make use here of 13 data sets of various size, namely (ordered by increasing sample size) Riboﬂavin (RI), Ozone (OZ), Diabetes (DI), Abalone (AB), Boston (BO), Bike-Day (BD), E2006, Skill (SK), Ailerons (AL), Bike-Hour (BH), Super Conductor (SC), Facebook Comments (FC) and Video Transcoding (VT), all commonly used in regression tasks. ... Full details on these datasets (sample size and number of variables, location) are given in the Supplementary Material.
Dataset Splits	Yes	Each fold is divided into 80% for train and 20% for test, except for Soft trees and PR trees and their gradient boosted extension (see below) for which each fold is divided into 65% for train, 15% for validation and 20% for test.
Hardware Specification	Yes	All experiments were conducted on a 256 GB RAM server with 32 CPUs at 2.60GHz.
Software Dependencies	No	The paper mentions using 'Scikit-Learn [29]' for standard regression trees and their ensemble extensions, but it does not specify a version number for this or any other software dependency.
Experiment Setup	Yes	Lastly, for both PR and standard regression trees, the stopping criterion is the same in all experiments: all leaves should contain at least 10% of the training data. ... For PR trees and their gradient boosted extension, it is used to estimate the noise vector σ through a grid search taking values, for each variable j, 1 j p, in the interval [0, 2ˆσj] with a step of ˆσj/4, where ˆσj denotes the empirical standard deviation of variable j. ... We use stratiﬁed 10-fold cross-validation to evaluate the performance of each method.