MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding
Authors: Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that MAPE-PPI can scale to PPI prediction with millions of PPIs with superior trade-offs between effectiveness and computational efficiency than the state-of-the-art competitors. |
| Researcher Affiliation | Academia | 1Westlake University, 2Zhejiang University, 3University of Notre Dame |
| Pseudocode | Yes | The pseudo-code of the proposed MAPE-PPI framework is summarized in Algorithm 1. |
| Open Source Code | Yes | Codes are available at: https://github.com/Lirong Wu/MAPE-PPI. |
| Open Datasets | Yes | The STRING dataset contains 1,150,830 PPI entries of Homo sapiens from the STRING database (Szklarczyk et al., 2019)... Moreover, we apply Alphafold2 (Jumper et al., 2021) to predict the 3D structures of all protein sequence data. |
| Dataset Splits | Yes | We split the PPIs into the training (60%), validation (20%), and testing (20%) for all baselines. |
| Hardware Specification | Yes | The experiments on both baselines and our approach are implemented based on the standard implementation using the Py Torch 1.6.0 with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and 8 NVIDIA A100 GPUs. |
| Software Dependencies | Yes | The experiments on both baselines and our approach are implemented based on the standard implementation using the Py Torch 1.6.0 |
| Experiment Setup | Yes | The following hyperparameters are set the same for all datasets and partitions: PPI encoder (GIN) with layer number Ls = 2 and hidden dimension 1024, learning rate lr = 0.001, weight decay decay = 1e 4, loss weight β = 0.25, pre-training epoch Epre = 50, PPI training epoch E = 500, thresholds ds = 2, dr = 10 A, and neighbor number K = 5. The other dataset-specific hyperparameters are determined by an Auto ML toolkit NNI with the hyperparameter search spaces as: protein encoder with layer number L = {4, 5} and hidden dimension F = {128, 256}, codebook size |A| = {256, 512, 1024}, mask ratio |M| / |A| = {0.1, 0.15, 0.2}, scaling factor γ = {1, 1.5, 2.0}, and loss weight η = {0.5, 1.0}. |