Learning Modular Structures from Network Data and Node Variables

Authors: Elham Azizi, Edoardo Airoldi, James Galagan

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the method accuracy in predicting modular structures from synthetic data and capability to learn regulatory modules in the Mycobacterium tuberculosis gene regulatory network.
Researcher Affiliation Academia Elham Azizi ELHAM@BU.EDU Bioinformatics Program, Boston University, Boston, MA 02215 USA; Edoardo M. Airoldi AIROLDI@FAS.HARVARD.EDU Department of Statistics, Harvard University, Camrbdige, MA 02138 USA; James E. Galagan JGALAG@BU.EDU Departments of Biomedical Engineering and Microbiology, Boston University, Boston, MA 02215 USA
Pseudocode Yes Algorithm 1 RJMCMC for sampling parameters
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We used interaction data identified with Ch IP-Seq of 50 MTB transcription factors and expression data for different induction levels of the same factors in 87 experiments, from a recent study by (Galagan et al., 2013).
Dataset Splits No The paper discusses synthetic data generation and application to real-world data, but it does not explicitly provide details on training, validation, and test dataset splits (e.g., percentages or counts) needed for reproduction.
Hardware Specification Yes It takes an average of 36 8 seconds to generate 100 samples for N = 200, C = 50, R = 10 on an i5 3.30GHz Intel(R).
Software Dependencies No We used Matlab-MPI for this implementation. The software is named, but no specific version numbers are provided.
Experiment Setup Yes The inference procedure was run for 20,000 samples. Exponential prior distributions were used for number of parents assigned to each module, to avoid over-fitting. [...] module assignments were initialized by k-means clustering of variables. [...] We performed 100,000 iterations on the combination of the two datasets. [...] We set the maxmimum number of modules to 10 and constrained the candidate pool of regulators to the 50 Ch IPped regulators only.