Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Omni-Mol: Multitask Molecular Model for Any-to-any Modalities

Authors: Chengxin Hu, Hao Li, Yihe Yuan, Zezheng Song, Chenyang Zhao, Haixin Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on our datasets show that Omni-Mol achieves significant improvements across 13 tasks simultaneously, setting new state-of-the-art results among both finetuned opensource LLMs and in-context learned closed-source LLMs. Additionally, we observe that Omni-Mol scales effectively with increases in data volume and model size, indicating the model s tremendous potential under larger computational budgets. Furthermore, by analyzing the representations of models trained on progressively more tasks, we discover that the representations become increasingly similar as the number of tasks grows.
Researcher Affiliation	Academia	1 National University of Singapore 2 Independent Researcher 3 University of Maryland, College Park 4University of California, Los Angeles {EMAIL, EMAIL}
Pseudocode	No	The paper describes the model architecture and training procedures using mathematical equations and textual descriptions, but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Git Hub: Omni-Mol-Code Hugging Face: Omni-Mol-Data&Weight
Open Datasets	Yes	We then collect a dataset encompassing over 16 tasks with more than 1.4 million samples, making it the largest molecular instruction-tuning dataset to date. (...) Our model achieves unified instruction tuning across 16 tasks and attains state-of-the-art performance on 13 of them. Extensive experiments further demonstrate the scalability and versatility of Omni-Mol.
Dataset Splits	Yes	Following [21], for the Forward Reaction Prediction task, we extract data from USPTO, and split the dataset into 124,384 training instances and 1,000 test instances. Partially following [11], for the Catalyst Prediction and Solvent Prediction tasks, we similarly extract data from USPTO, splitting the training/test sets into 10,079/1,015 and 67,099/7,793, respectively.
Hardware Specification	Yes	Accelerators. Training Omni-Mol costs 576 NVIDIA A100 80G GPU hours.
Software Dependencies	Yes	Software and Driver Versions. The experiments are conducted with the following key software Python 3.12.1 Pytorch 2.5.1 Transformers 4.45.2 CUDA 12.4
Experiment Setup	Yes	For unified tuning, we train 15 epochs with GAL rank of 64. For separate tuning, model is trained for 10 epochs with the same GAL configuration. The learning rate is set to 8e-5 from the grid search for all experiments. For consistency, the random seed is set to 0. More details can be found in Appendix D.