Joints in Random Forests
Authors: Alvaro Correia, Robert Peharz, Cassio P. de Campos
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features. |
| Researcher Affiliation | Collaboration | Alvaro H. C. Correia a.h.chaim.correia@tue.nl Eindhoven University of Technology Robert Peharz r.peharz@tue.nl Eindhoven University of Technology Cassio de Campos c.decampos@tue.nl Eindhoven University of Technology ... During part of the three years prior to the submission of this work, the authors were affiliated with the following institutions besides TU Eindhoven: Alvaro Correia was a full-time employee at Accenture and Itaú-Unibanco, and affiliated with Utrecht University; Cassio de Campos was affiliated with Queen s University Belfast and Utrecht University; Robert Peharz was affiliated with the University of Cambridge. |
| Pseudocode | Yes | Algorithm 1: Converting DT to PC (Ge DT). |
| Open Source Code | No | The paper does not provide an explicit statement about the release of their source code (e.g., 'Our code is available at...') nor does it provide a direct link to a code repository for the implemented methodology. It refers to 'Learn SPN [16]' which is a prominent PC learner, implying its use, but not releasing their specific implementation. |
| Open Datasets | Yes | We compare the accuracy of the methods in a selection of datasets from the Open ML-CC18 benchmark3 [51] and the wine-quality dataset [33]. ... We repeat a similar experiment with images, where we use the MNIST dataset [27] to fit a Gaussian KDE, a Random Forest and its corresponding Ge F+. We then evaluate these models on different digit datasets, namely Semeion [11] and SVHN [34] (converted to grayscale and 784 pixels)... |
| Dataset Splits | Yes | Table 1 presents results for 30% of missing values at test time (different percentages are shown in the supp. material), with 95% confidence intervals across 10 repetitions of 5-fold cross-validation. ... We then compute the log-density of unseen data (70/30 train test split) for the two wine types with both models. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions software tools like 'Learn SPN' but does not provide specific version numbers for any software dependencies, which would be required for reproducible setup. |
| Experiment Setup | Yes | In all experiments, Ge F, Ge F(Learn SPN) and the RF share the exact same structure (partition over the feature space) and are composed of 100 trees; including more trees has been shown to yield only marginal gains in most cases [39]. In Ge F(Learn SPN), we run Learn SPN only for leaves with more than 30 samples, defaulting to a fully factorised model in smaller leaves. |