Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MuSe-GNN: Learning Unified Gene Representation From Multimodal Biological Graph Data
Authors: Tianyu Liu, Yuge Wang, Rex Ying, Hongyu Zhao
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive benchmarking analysis shows our model s capacity to effectively capture gene function similarity across multiple modalities, outperforming state-of-the-art methods in gene representation learning by up to 97.5%. Moreover, we employ bioinformatics tools in conjunction with gene representations to uncover pathway enrichment, regulation causal networks, and functions of disease-associated or dosage-sensitive genes. |
| Researcher Affiliation | Academia | Tianyu Liu Yale University EMAIL Yuge Wang Yale University EMAIL Rex Ying Yale University EMAIL Hongyu Zhao* Yale University EMAIL |
| Pseudocode | Yes | Algorithm 1 Multimodal Similarity Learning Graph Neural Network (Mu Se-GNN) |
| Open Source Code | Yes | 1Codes of Mu Se-GNN: https://github.com/Hello World LTY/Mu Se-GNN |
| Open Datasets | Yes | Leveraging 82 training datasets from 10 tissues, three sequencing techniques, and three species, we create informative graph structures for model training and gene representations generation... Download links: Appendix M. ... We used Mu Se-GNN to generate gene embeddings for different datasets based on an unsupervised learning framework and utilized the gene embeddings as training dataset to predict the function of genes based on k-NN classifier. |
| Dataset Splits | No | The paper does not explicitly state training/validation/test dataset splits with specific percentages or counts for reproducibility. It discusses evaluation metrics and benchmarking but not a dedicated validation split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run the experiments (e.g., specific GPU/CPU models, memory, or cloud resources). |
| Software Dependencies | No | The paper mentions software like "Scanpy [104]" and tools like "ggplot2 [102]" but does not provide specific version numbers for these or other key software dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | Details of hyper-parameter tuning can be found in Appendix E.2. |