Statistical Inference for Cluster Trees
Authors: Jisu KIM, Yen-Chi Chen, Sivaraman Balakrishnan, Alessandro Rinaldo, Larry Wasserman
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we illustrate the proposed methods on a variety of synthetic examples and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (Gv HD) data set. In this section, we demonstrate the techniques we have developed for inference on synthetic data, as well as on a real dataset. Figure 4 shows those data ((a), (b), and (c)) along with the pruned density trees (solid parts in (d), (e), and (f)). |
| Researcher Affiliation | Academia | Jisu Kim Department of Statistics Carnegie Mellon University Pittsburgh, USA jisuk1@andrew.cmu.edu Yen-Chi Chen Department of Statistics University of Washington Seattle, USA yenchic@uw.edu Sivaraman Balakrishnan Department of Statistics Carnegie Mellon University Pittsburgh, USA siva@stat.cmu.edu Alessandro Rinaldo Department of Statistics Carnegie Mellon University Pittsburgh, USA arinaldo@stat.cmu.edu Larry Wasserman Department of Statistics Carnegie Mellon University Pittsburgh, USA larry@stat.cmu.edu |
| Pseudocode | No | The paper describes pruning operations in numbered steps (1. Pruning only leaves:, 2. Pruning leaves and internal branches:) but does not present them in a structured pseudocode or algorithm block format. |
| Open Source Code | No | The paper does not provide any statement or link regarding the public availability of its source code. |
| Open Datasets | Yes | Now we apply our method to the Gv HD (Graft-versus-Host Disease) dataset [3]. Gv HD is a complication that may occur when transplanting bone marrow or stem cells from one subject to another [3]. We obtained the Gv HD dataset from R package mclust . [3] R. R. Brinkman, M. Gasparetto, S.-J. J. Lee, A. J. Ribickas, J. Perkins, W. Janssen, R. Smiley, and C. Smith. High-content flow cytometry and temporal data analysis for defining a cellular signature of graft-versus-host disease. Biology of Blood and Marrow Transplantation, 13(6):691 700, 2007. |
| Dataset Splits | No | The paper uses bootstrap sampling for statistical inference and constructing confidence sets but does not specify traditional train/validation/test dataset splits for model training or evaluation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper mentions obtaining the Gv HD dataset from 'R package mclust' but does not provide specific version numbers for R or the mclust package. |
| Experiment Setup | Yes | The smoothing bandwidth is chosen by the Silverman reference rule [20] and we pick the significance level α = 0.05. By the normal reference rule [20], we pick h = 39.1 for the positive sample and h = 42.2 for the control sample. We set the significance level α = 0.05. |