reproducibilityindex.ai

Efficient Dimensionality Reduction for High-Dimensional Network Estimation

Authors: Safiye Celik, Benjamin Logsdon, Su-In Lee

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present our results on synthetic data (Sec. 4.1) and ovarian cancer gene expression data (Sec. 4.2). We compared MGL algorithm with four other methods in terms of the performance of learning networks with latent variables...
Researcher Affiliation	Academia	Saﬁye Celik SAFIYE@CS.WASHINGTON.EDU Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195; Benjamin A. Logsdon BLOGSDON@CS.WASHINGTON.EDU Department of Genome Sciences, University of Washington, Seattle, WA 98195; Su-In Lee SUINLEE@CS.WASHINGTON.EDU Departments of Computer Science and Engineering, Genome Sciences, University of Washington, Seattle, WA 98195
Pseudocode	No	The paper describes the learning algorithm in Section 3.1 through textual descriptions of iterative estimation steps (e.g., 'To estimate L given Z and ΘL, from Eq. 5, we solve the following problem:'). It does not provide a structured pseudocode block or algorithm figure.
Open Source Code	No	The paper states 'We implemented MGL in C' and refers to external packages used (e.g., 'CRAN R package QUIC'), but does not explicitly state that the source code for the implemented MGL methodology is publicly available. The link provided (http://leelab.cs.washington.edu/projects/MGL) explicitly states it contains 'Derivations of the learning algorithms and proofs', not code.
Open Datasets	Yes	We experimented MGL on three gene expression datasets containing 10404 gene expression levels in a total of 909 patients with ovarian serous carci- noma Tothill (269 samples) (Tothill et al., 2008), TCGA (560 samples) (TCGA, 2012), and Denkert (80 samples) (Denkert et al., 2009).
Dataset Splits	Yes	We performed 5-fold cross validation tests within the training dataset in order to select λ that gives the best average test log-likelihood for each method.
Hardware Specification	No	The paper does not provide specific hardware details (such as GPU/CPU models, processor types, or memory amounts) used for running its experiments. It makes no mention of the computing environment beyond general terms.
Software Dependencies	No	The paper mentions software like 'CRAN R package QUIC', 'CRAN R package simone', 'Logdet PPA', 'MATLAB software', and 'CRAN R package huge', and states 'We implemented MGL in C'. However, it does not provide specific version numbers for these key software components, which is required for reproducibility.
Experiment Setup	Yes	By setting a = 0.2 and b = 0.6 in Eq. 10, we created two different data matrices... We performed 5-fold cross validation tests within the training dataset in order to select λ that gives the best average test log-likelihood for each method... Cluster count (k) was determined as 150 by BIC... In the subsequent sets of experiments (Sections 4.2.2 and 4.2.3), we use k = 150 (as determined by BIC) and λ = .004 (as chosen by CV).