Uni-Mol: A Universal 3D Molecular Representation Learning Framework

Authors: Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, Guolin Ke

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of Uni-Mol, we conduct experiments on a series of downstream tasks. In the molecular property prediction tasks, Uni-Mol outperforms SOTA on 14/15 datasets on the Molecule Net benchmark.
Researcher Affiliation Collaboration Gengmo Zhou1,2 , Zhifeng Gao2 , Qiankun Ding2, Hang Zheng2 Hongteng Xu1, Zhewei Wei1, Linfeng Zhang2,3, Guolin Ke2 1Renmin University of China 2DP Technology 3AI for Science Institute, Beijing {zgm2015, hongtengxu, zhewei}@ruc.edu.cn {gaozf, dingqk, zhengh, zhanglf, kegl}@dp.tech
Pseudocode Yes Algorithm 1 Corrupted Position Generation and Assignment. Require: X Rm 3: coordinates of m atoms, r: noise range, re_assign: use re-assignment or not 1: R = X + δ, where δ Uniform( rÅ, rÅ) Generate corrupted positions 2: if (not re_assign) return R Return directly if not need the re-assignment 3: D = {D[i, j] = Xi Rj 2 | 1 i, j m} Compute distance 4: for i in random_perm(1, m) do Greedy assignment based a random order 5: k = argmin(Di,:) Get the nearest position at the i-th row 6: Yi = Rk Assignment 7: D:,k = inf Marked the k-th column as used return Y Return corrupted positions
Open Source Code Yes The code, model, and data are made publicly available at https://github.com/dptech-corp/Uni-Mol.
Open Datasets Yes Molecule Net [52] is a popular benchmark for molecular property prediction... we use GEOM-QM9 and GEOM-Drugs [87] dataset in this task... NRDLD [58], a commonly used dataset... PDBbind General set v.2020 [61] (19,443 complexes)... For the benchmark dataset, referring to the previous works [28; 60], we use CASF-2016 as the test set... Therefore, we also release our created benchmark dataset, and hopefully, it can help future research.
Dataset Splits Yes Following previous work GEM[13], we use scaffold splitting[88] to divide the dataset into training, validation, and test sets in the ratio of 8:1:1.
Hardware Specification Yes Molecular pretraining runs on 8 V100 GPUs (32GB memory, the same below), and the training time is about 20 hours.
Software Dependencies No The paper mentions software tools like RDKit and Fpocket, but does not provide specific version numbers for these or other key software dependencies required for reproducibility.
Experiment Setup Yes We report the detailed hyperparameters setup of Uni-mol during pretraining in Table 6. Uni-Mol training loss is summed up by three components, atom (token) loss, coordinate loss, and pair-distance loss. ... The specific search space is shown in Table 7. ... We report the detailed hyperparameters setup for molecular conformation generation in Table 9. ... The hyperparameters we search are listed in Table 10. ... The hyper-parameters used in fine-tuning are listed in Table 12.