Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights

Authors: Jingjing Hu, Dan Guo, Zhan Si, Deguang Liu, Yunfeng Diao, Jing Zhang, Jinxing Zhou, Meng Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that MOL-Mamba outperforms state-of-the-art baselines across eleven chemical-biological molecular datasets. In this section, we conduct comprehensive experiments to demonstrate the effcacy of our proposed method.
Researcher Affiliation Academia 1School of Computer Science and Information Engineering, Hefei University of Technology 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center 3Department of Chemistry and Centre for Atomic Engineering of Advanced Materials, Anhui University 4Department of Applied Chemistry, University of Science and Technology of China
Pseudocode Yes Algorithm 1: Mamba block with Graph SSM
Open Source Code Yes Code https://github.com/xian-sh/MOL-Mamba
Open Datasets Yes We use the recently popular GEOM (Axelrod and Gomez-Bombarelli 2022) that contains 50k qualified molecules, for molecular pretraining, followed by (Liu et al. 2022; Wang et al. 2023). For downstream tasks, we conduct experiments on 11 benchmark datasets from the Molecule Net (Wu et al. 2018), they involve physical chemistry, biophysics, physiology and quantum mechanics.
Dataset Splits Yes Each dataset uses the recommended splitting method to divide data into training/validation/test sets with a ratio of 8:1:1.
Hardware Specification Yes We develop all codes on a single NVIDIA RTX A5000 GPU.
Software Dependencies No The paper mentions 'Chem Des package (Dong et al. 2015)', '6-layer GIN (Xu et al. 2019)', and '6-layer Sch Net (Sch utt et al. 2017)' but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For pretraining, we set the temperature coefficient in Eq. 6 τ = 0.5 , and we set the mask ratio as α = 10 (%) for mask matrix M in Eq. 10. Based on the order of magnitude of each loss, we set different loss weights as follows, Ld = Ls = Lmask = 0.1, Lf = 20.0, respectively. For pretraining and fine-tuning, we employ the Adam W optimizer, the learning rate is set to 0.0001, and the batch size is 64, and the training is conducted 100 epochs, with the early stopping on the validation set.