MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

Authors: Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that MAPE-PPI can scale to PPI prediction with millions of PPIs with superior trade-offs between effectiveness and computational efficiency than the state-of-the-art competitors.
Researcher Affiliation Academia 1Westlake University, 2Zhejiang University, 3University of Notre Dame
Pseudocode Yes The pseudo-code of the proposed MAPE-PPI framework is summarized in Algorithm 1.
Open Source Code Yes Codes are available at: https://github.com/Lirong Wu/MAPE-PPI.
Open Datasets Yes The STRING dataset contains 1,150,830 PPI entries of Homo sapiens from the STRING database (Szklarczyk et al., 2019)... Moreover, we apply Alphafold2 (Jumper et al., 2021) to predict the 3D structures of all protein sequence data.
Dataset Splits Yes We split the PPIs into the training (60%), validation (20%), and testing (20%) for all baselines.
Hardware Specification Yes The experiments on both baselines and our approach are implemented based on the standard implementation using the Py Torch 1.6.0 with Intel(R) Xeon(R) Gold 6240R @ 2.40GHz CPU and 8 NVIDIA A100 GPUs.
Software Dependencies Yes The experiments on both baselines and our approach are implemented based on the standard implementation using the Py Torch 1.6.0
Experiment Setup Yes The following hyperparameters are set the same for all datasets and partitions: PPI encoder (GIN) with layer number Ls = 2 and hidden dimension 1024, learning rate lr = 0.001, weight decay decay = 1e 4, loss weight β = 0.25, pre-training epoch Epre = 50, PPI training epoch E = 500, thresholds ds = 2, dr = 10 A, and neighbor number K = 5. The other dataset-specific hyperparameters are determined by an Auto ML toolkit NNI with the hyperparameter search spaces as: protein encoder with layer number L = {4, 5} and hidden dimension F = {128, 256}, codebook size |A| = {256, 512, 1024}, mask ratio |M| / |A| = {0.1, 0.15, 0.2}, scaling factor γ = {1, 1.5, 2.0}, and loss weight η = {0.5, 1.0}.