Mode Estimation for High Dimensional Discrete Tree Graphical Models

Authors: Chao Chen, Han Liu, Dimitris Metaxas, Tianqi Zhao

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the accuracy and efficiency of our algorithm. More theoretical guarantee of our algorithm can be found in [7]. To validate our method, we first show the scalability and accuracy of our algorithm in synthetic data. Furthermore, we demonstrate using biological data how modes can be used as a novel analysis tool.
Researcher Affiliation Academia Chao Chen Department of Computer Science Rutgers, The State University of New Jersey Piscataway, NJ 08854-8019 chao.chen.cchen@gmail.com Han Liu Department of Operations Research and Financial Engineering Princeton University, Princeton, NJ 08544 hanliu@princeton.edu Dimitris N. Metaxas Department of Computer Science Rutgers, The State University of New Jersey Piscataway, NJ 08854-8019 dnm@cs.rutgers.edu Tianqi Zhao Department of Operations Research and Financial Engineering Princeton University, Princeton, NJ 08544 tianqi@princeton.edu
Pseudocode Yes Procedure 1 Compute-M-Modes Input: A tree G, a potential function f and a scale δ Output: The M modes of the lowest potential 1: Construct geodesic balls B = {Br(c) | c V}, where r = δ 2 + 1 2: for all B B do 3: Mδ B = the set of local modes of B 4: Construct a junction tree (Figure 2). The label set of each supernode is its local modes. 5: Compute the M lowest-potential labelings of the junction tree, using Nilsson s algorithm.
Open Source Code No The paper does not provide any link to source code nor states its public availability for the described methodology.
Open Datasets Yes Biological data analysis. We compute modes of the microarray data of Arabidopsis thaliana plant (108 samples, 39 dimensions) [24].
Dataset Splits No The paper mentions generating synthetic data and using different sample sizes (10K, 40K, 80K) for evaluation, and also uses a biological dataset, but it does not specify explicit training, validation, or test dataset splits.
Hardware Specification No The paper discusses running time and scalability, but does not provide any specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No The paper describes algorithms and methods but does not list any specific software dependencies with version numbers.
Experiment Setup Yes In all experiments, we choose M to be 500. We randomly generate tree-structured graphical model (tree size D =200 ...2000, label size L = 3) and test the speed. We randomly generate tree-structured distributions (D = 20, L = 2). To evaluate the sensitivity of our method to noise, we randomly flip 0%, 5%, 10%, 15% and 20% labels of these samples.