Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery

Authors: Xiuyuan Hu, Guoqing Liu, Can Chen, Yang Zhao, Hao Zhang, Xue Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that 3DMol Former outperforms previous approaches in both protein-ligand docking and pocket-aware 3D drug design, highlighting its promising application in structure-based drug discovery. The paper includes Section 4 titled "EXPERIMENTS" which details data, baselines, ablation studies, evaluation metrics like RMSD, Vina Score, QED, SA, and Success Rate, and presents results in several tables (Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7).
Researcher Affiliation	Collaboration	The author list includes affiliations with "Tsinghua University" and "McGill University" (academic institutions), as well as "Microsoft Research AI for Science" (an industry research lab). The corresponding email domains like mails.tsinghua.edu.cn, cs.mcgill.ca (academic) and microsoft.com (industry) also confirm a mixed affiliation.
Pseudocode	No	The paper describes the methodology using prose and mathematical formulas, but it does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks or structured algorithmic steps.
Open Source Code	Yes	The code is available at: https://github.com/HXYfighter/3DMolFormer.
Open Datasets	Yes	The paper references publicly available datasets such as PDBbind (Liu et al., 2017), Cross Docked2020 (Francoeur et al., 2020), Uni-Mol (Zhou et al., 2023a), and CASF-2016 (Su et al., 2018). Additionally, Appendix B provides direct links to data sources: "Data Source pockets for pre-training (3.2M), ligand conformations for pre-training (209M), and ground-truth protein-ligand complexes for docking fine-tuning (17K): https://github.com/deepmodeling/Uni-Mol/tree/main/unimol." and "Docked protein-ligand complexes for pre-training and test set for pocket-aware 3D drug design: https://github.com/guanjq/targetdiff."
Dataset Splits	Yes	For protein-ligand docking (Section 4.1), the paper states: "Following Uni-Mol (Zhou et al., 2023a), we use PDBbind v2020 (Liu et al., 2017) as the training set for supervised fine-tuning on protein-ligand docking and CASF-2016 (Su et al., 2018) as the test set, which includes 285 test samples." and "...results in a training set comprising 18,404 ground-truth complexes." For pocket-aware 3D drug design (Section 4.2), it mentions: "we select 100 protein pockets from the Cross Docked2020 (Francoeur et al., 2020) dataset... thereby establishing our targets for 3D drug design."
Hardware Specification	Yes	The pre-training process takes less than 48 hours with 4 A100 80G GPUs. The training process takes less than 24 hours with 4 A100 80G GPUs. Additionally, the average time taken by 3DMol Former to predict a binding pose is 0.8 seconds using 1 A100 80G GPU. The RL process for each protein pocket takes less than 8 hours using 1 A100 80G GPU and 128 CPU cores.
Software Dependencies	No	The paper mentions the GPT-2 model architecture (Radford et al., 2019) and the Adam W optimizer (Loshchilov & Hutter, 2019), but it does not specify version numbers for these or other software libraries or programming languages used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	The paper provides detailed experimental setup information, including model parameters (92M parameters, 12 transformer layers, 12 self-attention heads, 768 embedding dimension), maximum sequence length (2048), pre-training specifics (one epoch, batch size of 10K with gradient accumulation, maximal learning rate 5e-4 with 1% warmup and cosine decay, Adam W optimizer with 0.1 weight decay, composite loss coefficient α=1.0). Fine-tuning details are also provided: for docking (2000 epochs, batch size 128, max learning rate 1e-4, 1% warmup, cosine decay), and for RL drug design (500 RL steps, batch size 128, constant learning rate 1e-4, σ=100).