Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training
Authors: Minghao Xu, Jiaze Song, Keming Wu, Xiangxin Zhou, Bin Cui, Wentao Zhang
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive benchmark results show the superiority of Glycan AA over existing glycan encoders and verify the further improvements achieved by Pre Glycan AA. We evaluate the proposed models on the Glycan ML benchmark (Xu et al., 2024). This benchmark contains a comprehensive set of 11 glycan property and function prediction tasks. Experimental results show that Pre Glycan AA and Glycan AA respectively rank first and second on the benchmark, and they substantially outperform SOTA atomic-level small molecule encoders and glycan-specific monosaccharide-level encoders. We further demonstrate the effectiveness of the proposed hierarchical message passing and multi-scale mask prediction methods through extensive ablation studies. |
| Researcher Affiliation | Academia | 1Peking University 2Bio Geometry 3Zhongguancun Academy 4Tsinghua University 5University of Chinese Academy of Sciences. Correspondence to: Wentao Zhang <EMAIL>. |
| Pseudocode | No | The paper describes methods using mathematical equations and structured text, for instance in Section 3.2 Hierarchical Message Passing on All-Atom Glycan Graph, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | We maintain all resources at https://github.com/ kasawa1234/Glycan AA. |
| Open Datasets | Yes | We choose the Gly Tou Can database (Tiemeyer et al., 2017) as the data source for its high recognition in the glycoscience domain and instant update of the latest glycan structures. We evaluate the effectiveness of the proposed models on the Glycan ML benchmark (Xu et al., 2024). |
| Dataset Splits | Yes | Following the standard of Glycan ML benchmark, we conduct all experiments on seeds 0, 1 and 2 and report the mean and standard deviation of results. The evaluation is performed on the dataset of glycan taxonomy prediction for its good coverage of different kinds of glycans (#training/validation/test samples: 11,010/1,280/919, average #monosaccharides per glycan: 6.39, minimum #monosaccharides per glycan: 2, maximum #monosaccharides per glycan: 43). |
| Hardware Specification | Yes | The pre-training is conducted on a local server with 200 CPU cores and 10 NVIDIA Ge Force RTX 4090 GPUs (24GB). All downstream experiments are conducted on a local server with 100 CPU cores and 4 NVIDIA Ge Force RTX 4090 GPUs (24GB). All experiments are conducted on a machine with 32 CPU cores and 1 NVIDIA Ge Force RTX 4090 GPU (24GB). |
| Software Dependencies | No | All implementations are based on the Py Torch (Paszke et al., 2019) and Torch Drug (Zhu et al., 2022) libraries. Specific version numbers for these libraries are not provided. |
| Experiment Setup | Yes | For pre-training and downstream task training, we implement each prediction head as a 2-layer MLP with GELU activation. The Pre Glycan AA model is pre-trained with an Adam optimizer (learning rate: 5 10 4, weight decay: 1 10 3, batch size: 256) for 50 epochs on the curated pre-training dataset. We set the atom mask ratio ρa and the monosaccharide mask ratio ρm as 0.45 and 0.15. For Glycan AA, we train it with an Adam optimizer (learning rate: 5 10 4, weight decay: 1 10 3) for 50 epochs with batch size 256 on taxonomy, immunogenicity and glycosylation type prediction and for 10 epochs with batch size 32 on interaction prediction. For fine-tuning Pre Glycan AA on downstream tasks, we keep other settings the same as Glycan AA except that the learning rate of the encoder part is set as one tenth of that of the following task-specific MLP predictor (i.e., encoder learning rate: 5 10 5, predictor learning rate: 5 10 4). |