Statistical Inference for Cluster Trees

Authors: Jisu KIM, Yen-Chi Chen, Sivaraman Balakrishnan, Alessandro Rinaldo, Larry Wasserman

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we illustrate the proposed methods on a variety of synthetic examples and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (Gv HD) data set. In this section, we demonstrate the techniques we have developed for inference on synthetic data, as well as on a real dataset. Figure 4 shows those data ((a), (b), and (c)) along with the pruned density trees (solid parts in (d), (e), and (f)).
Researcher Affiliation Academia Jisu Kim Department of Statistics Carnegie Mellon University Pittsburgh, USA jisuk1@andrew.cmu.edu Yen-Chi Chen Department of Statistics University of Washington Seattle, USA yenchic@uw.edu Sivaraman Balakrishnan Department of Statistics Carnegie Mellon University Pittsburgh, USA siva@stat.cmu.edu Alessandro Rinaldo Department of Statistics Carnegie Mellon University Pittsburgh, USA arinaldo@stat.cmu.edu Larry Wasserman Department of Statistics Carnegie Mellon University Pittsburgh, USA larry@stat.cmu.edu
Pseudocode No The paper describes pruning operations in numbered steps (1. Pruning only leaves:, 2. Pruning leaves and internal branches:) but does not present them in a structured pseudocode or algorithm block format.
Open Source Code No The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets Yes Now we apply our method to the Gv HD (Graft-versus-Host Disease) dataset [3]. Gv HD is a complication that may occur when transplanting bone marrow or stem cells from one subject to another [3]. We obtained the Gv HD dataset from R package mclust . [3] R. R. Brinkman, M. Gasparetto, S.-J. J. Lee, A. J. Ribickas, J. Perkins, W. Janssen, R. Smiley, and C. Smith. High-content flow cytometry and temporal data analysis for defining a cellular signature of graft-versus-host disease. Biology of Blood and Marrow Transplantation, 13(6):691 700, 2007.
Dataset Splits No The paper uses bootstrap sampling for statistical inference and constructing confidence sets but does not specify traditional train/validation/test dataset splits for model training or evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments.
Software Dependencies No The paper mentions obtaining the Gv HD dataset from 'R package mclust' but does not provide specific version numbers for R or the mclust package.
Experiment Setup Yes The smoothing bandwidth is chosen by the Silverman reference rule [20] and we pick the significance level α = 0.05. By the normal reference rule [20], we pick h = 39.1 for the positive sample and h = 42.2 for the control sample. We set the significance level α = 0.05.