On Sparse Gaussian Chain Graph Models

Authors: Calvin McCarter, Seyoung Kim

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our approach on simulated and genomic datasets. In simulation study, we considered two scenarios for true models, CGGM-based and linear-regression-based Gaussian chain graph models. We evaluated the performance in terms of graph structure recovery and prediction accuracy in both supervised and semi-supervised settings. We applied the two types of three-layer chain graph models to single-nucleotide-polymorphism (SNP), gene-expression, and phenotype data from the pancreatic islets study for diabetic mice [18].
Researcher Affiliation Academia Calvin Mc Carter Machine Learning Department Carnegie Mellon University calvinm@cmu.edu Seyoung Kim Lane Center for Computational Biology Carnegie Mellon University sssykim@cs.cmu.edu
Pseudocode No No pseudocode or algorithm blocks explicitly labeled as such were found in the paper.
Open Source Code No The paper does not provide any statement or link indicating that the source code for their methodology is publicly available.
Open Datasets Yes We applied the two types of three-layer chain graph models to single-nucleotide-polymorphism (SNP), gene-expression, and phenotype data from the pancreatic islets study for diabetic mice [18].
Dataset Splits Yes Of the total 506 samples, we used 406 as training set, of which 100 were held out as a validation set to select regularization parameters, and used the remaining 100 samples as test set to evaluate prediction accuracies.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions using 'the optimization methods in [20] for CGGMbased models and the MRCE procedure [16] for linearregression-based models', but does not provide specific software names with version numbers.
Experiment Setup Yes In order to simulate data, we assumed the problem size of J=500, K=100, and L=50 for x, y, and z, respectively, and generated samples from known true models. Each dataset consisted of 600 samples, of which 400 and 200 samples were used as training and test sets. To select the regularization parameters, we estimated a model using 300 samples, evaluated prediction errors on the other 100 samples in the training set, and selected the values with the lowest prediction errors.