Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Modular Structures from Network Data and Node Variables
Authors: Elham Azizi, Edoardo Airoldi, James Galagan
ICML 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the method accuracy in predicting modular structures from synthetic data and capability to learn regulatory modules in the Mycobacterium tuberculosis gene regulatory network. |
| Researcher Affiliation | Academia | Elham Azizi EMAIL Bioinformatics Program, Boston University, Boston, MA 02215 USA; Edoardo M. Airoldi EMAIL Department of Statistics, Harvard University, Camrbdige, MA 02138 USA; James E. Galagan EMAIL Departments of Biomedical Engineering and Microbiology, Boston University, Boston, MA 02215 USA |
| Pseudocode | Yes | Algorithm 1 RJMCMC for sampling parameters |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We used interaction data identified with Ch IP-Seq of 50 MTB transcription factors and expression data for different induction levels of the same factors in 87 experiments, from a recent study by (Galagan et al., 2013). |
| Dataset Splits | No | The paper discusses synthetic data generation and application to real-world data, but it does not explicitly provide details on training, validation, and test dataset splits (e.g., percentages or counts) needed for reproduction. |
| Hardware Specification | Yes | It takes an average of 36 8 seconds to generate 100 samples for N = 200, C = 50, R = 10 on an i5 3.30GHz Intel(R). |
| Software Dependencies | No | We used Matlab-MPI for this implementation. The software is named, but no specific version numbers are provided. |
| Experiment Setup | Yes | The inference procedure was run for 20,000 samples. Exponential prior distributions were used for number of parents assigned to each module, to avoid over-fitting. [...] module assignments were initialized by k-means clustering of variables. [...] We performed 100,000 iterations on the combination of the two datasets. [...] We set the maxmimum number of modules to 10 and constrained the candidate pool of regulators to the 50 Ch IPped regulators only. |