Efficient Dimensionality Reduction for High-Dimensional Network Estimation

Authors: Safiye Celik, Benjamin Logsdon, Su-In Lee

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present our results on synthetic data (Sec. 4.1) and ovarian cancer gene expression data (Sec. 4.2). We compared MGL algorithm with four other methods in terms of the performance of learning networks with latent variables...
Researcher Affiliation Academia Safiye Celik SAFIYE@CS.WASHINGTON.EDU Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195; Benjamin A. Logsdon BLOGSDON@CS.WASHINGTON.EDU Department of Genome Sciences, University of Washington, Seattle, WA 98195; Su-In Lee SUINLEE@CS.WASHINGTON.EDU Departments of Computer Science and Engineering, Genome Sciences, University of Washington, Seattle, WA 98195
Pseudocode No The paper describes the learning algorithm in Section 3.1 through textual descriptions of iterative estimation steps (e.g., 'To estimate L given Z and ΘL, from Eq. 5, we solve the following problem:'). It does not provide a structured pseudocode block or algorithm figure.
Open Source Code No The paper states 'We implemented MGL in C' and refers to external packages used (e.g., 'CRAN R package QUIC'), but does not explicitly state that the source code for the implemented MGL methodology is publicly available. The link provided (http://leelab.cs.washington.edu/projects/MGL) explicitly states it contains 'Derivations of the learning algorithms and proofs', not code.
Open Datasets Yes We experimented MGL on three gene expression datasets containing 10404 gene expression levels in a total of 909 patients with ovarian serous carci- noma Tothill (269 samples) (Tothill et al., 2008), TCGA (560 samples) (TCGA, 2012), and Denkert (80 samples) (Denkert et al., 2009).
Dataset Splits Yes We performed 5-fold cross validation tests within the training dataset in order to select λ that gives the best average test log-likelihood for each method.
Hardware Specification No The paper does not provide specific hardware details (such as GPU/CPU models, processor types, or memory amounts) used for running its experiments. It makes no mention of the computing environment beyond general terms.
Software Dependencies No The paper mentions software like 'CRAN R package QUIC', 'CRAN R package simone', 'Logdet PPA', 'MATLAB software', and 'CRAN R package huge', and states 'We implemented MGL in C'. However, it does not provide specific version numbers for these key software components, which is required for reproducibility.
Experiment Setup Yes By setting a = 0.2 and b = 0.6 in Eq. 10, we created two different data matrices... We performed 5-fold cross validation tests within the training dataset in order to select λ that gives the best average test log-likelihood for each method... Cluster count (k) was determined as 150 by BIC... In the subsequent sets of experiments (Sections 4.2.2 and 4.2.3), we use k = 150 (as determined by BIC) and λ = .004 (as chosen by CV).