Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning

Authors: Zihao Jing, Yan Sun, Yan Yi Li, Sugitha Janarthanan, Alana Deng, Pingzhao Hu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across 29 benchmark tasks from Therapeutics Data Commons (TDC) and Molecule Net, Mu Mo achieves an average improvement of 2.7% over the best-performing baseline on each task, ranking first on 22 of them, including a 27% improvement on the LD50 task. These results validate its robustness to 3D conformer noise and the effectiveness of multimodal fusion in molecular representation. The code is available at: github.com/selmiss/Mu Mo. In this section, we conduct comprehensive experiments to evaluate the performance, robustness, and consistency of Mu Mo across diverse molecular tasks. We pretrained Mu Mo on the Ch EMBL-1.6M dataset [Gaulton et al., 2012] via masked language modeling (MLM), followed by task-specific fine-tuning (see Appendix C.4). In addition, we present ablation studies and visualization analysis to show the contribution of each component in enhancing the multimodal integration and improving the overall quality of molecular prediction.
Researcher Affiliation	Academia	Zihao Jing1, Yan Sun1, Yan Yi Li2, Sugitha Janarthanan2, Alana Deng1, Pingzhao Hu1,2 1Department of Computer Science, Western University, London, ON, Canada 2Department of Biochemistry, Western University, London, ON, Canada EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Unified Batching Scheme Input: List of Unified Graphs {T1, . . . , TN}; T (batch) new(T ), δv 0, δe 0 Output: T (batch) Algorithm 2 Injection Enhanced Attention Input: Sequence hiddens h(t) S , graph T (batch). Output: h(t+1) S , T (batch)
Open Source Code	Yes	The code is available at: github.com/selmiss/Mu Mo.
Open Datasets	Yes	Across 29 benchmark tasks from Therapeutics Data Commons (TDC) and Molecule Net, Mu Mo achieves an average improvement of 2.7% over the best-performing baseline on each task, ranking first on 22 of them, including a 27% improvement on the LD50 task. We pretrained Mu Mo on the Ch EMBL-1.6M dataset [Gaulton et al., 2012] via masked language modeling (MLM), followed by task-specific fine-tuning (see Appendix C.4). To evaluate performance and generalization ability, we benchmark Mu Mo on 29 tasks from three widely used platforms: 14 from the TDC [Huang et al., 2021], which provides rigorous absorption, distribution, metabolism, excretion, and toxicity (ADMET) challenges and leaderboard baselines, and 12 from Molecule Net [Wu et al., 2018], along with 3 chemical tasks from Reaxtica [Lin et al., 2022] which enables evaluation against strong unimodal and pretrained models.
Dataset Splits	Yes	AUROC is used for classification; MAE (TDC) and RMSE (Molecule Net) for regression. Molecule Net tasks use scaffold split for single-objective classification; otherwise, random. Each task is run 5 times: we use the official leaderboard splits for TDC and generate 5 splits for Molecule Net (Train:Valid:Test=8:1:1). Hyperparameters follow each baseline s official setup or defaults if unspecified. Additional details about datasets and settings are provided in Appendix C.5.
Hardware Specification	Yes	A single pretraining process will take around only 5 hours on 4x A100-80G GPUs. Mu Mo requires a minimum of 24 GB GPU memory for fine-tuning and can be trained on a single NVIDIA RTX 4090. The actual training time varies by task depending on the number and size of molecules. Empirically, training on a dataset with 1000 molecules typically takes 10 20 minutes. Larger GPUs or multi-GPU setups can further accelerate training. Table 16: Computing cost on QM9 in the finetuning stage. We report the hardware configuration, dataset statistics, and training runtime for fine-tuning Mu Mo (505M) on QM9. CPU Intel(R) Xeon(R) Platinum 8480+ GPU Count 2 GPU Type NVIDIA A100-SXM4-80GB
Software Dependencies	No	The paper does not explicitly state specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	The basic model setup was configured with a hidden size of 768, 16 attention-mamba layers, and 12 attention heads, ensuring robust model capacity. The training batch size was set to 512, with a learning rate of 1e-4 and a cosine learning rate scheduler featuring 2000 warmup steps. We used SILU activation inside the Mamba module, layer normalization, and dropout rates of 0.1 for both attention layers. The training spanned 2 epochs with gradient accumulation and utilized mixed precision with bf16, optimizing computational efficiency.