Association Pattern-aware Fusion for Biological Entity Relationship Prediction

Authors: Lingxiang Jia, Yuchen Ying, Zunlei Feng, Zipeng Zhong, Shaolun Yao, Jiacong Hu, Mingjiang Duan, Xingen Wang, Jie Song, Mingli Song

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on three biological datasets quantitatively demonstrate that the proposed method achieves about 4%-23% hit@1 improvements compared with state-of-the-art baselines.
Researcher Affiliation Collaboration 1State Key Laboratory of Blockchain and Data Security, Zhejiang University 2Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3Bangsheng Technology Co, Ltd.
Pseudocode No The paper describes algorithms and methods in text and mathematical formulas but does not include a distinct pseudocode block or algorithm figure.
Open Source Code Yes Our data and code are available at https://github.com/hry98kki/Pattern BERP.
Open Datasets Yes In this paper, we adopt three biological entity association datasets with significant biological meaning, namely DMD (Drug-Microbe-Disease), DDC (synergistic Drug-Drug-Cell line) and DPA (Drug-target Protein-Adverse reaction), among which DPA dataset is directly constructed. In line with DMD and DDC, we utilize preprocessing tools provided by [21] to deal with the original data from ADRe CS-Target [44], and collect a total of 1,079 triplets that are structured into the data schema <drug, protein, adr>.
Dataset Splits Yes To accurately evaluate model performance and prevent overfitting, 5-fold cross-validation is used to evaluate the performance. Specifically, we randomly split the dataset into a 90% cross-validation set and a 10% independent test set. On the cross-validation set, the 5-fold cross-validation is implemented.
Hardware Specification Yes Furthermore, all experiments are conducted on a single NVIDIA A6000 Tensor Core GPU (48GB) and Intel(R) Xeon CPU with 24 cores and 500G memory.
Software Dependencies No The paper mentions software components like 'Vallina GIN', '1D CNN model', and 'Adam optimizer', but does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup Yes The initialized entity embedding size d is fixed to 128. The number of BGNN, APF layers are all fixed to 2. The training epoch is setting to 1,000 for DMD, DPA datasets and 2,000 for DDC dataset. The number of max hop in pattern sampling U is setting to 3. The number of sampled patterns N is setting to 100. The number of attention heads is 32 for DMD dataset, 16 for DDC dataset, 4 for DPA dataset. The loss-balanced coefficient for bind-relation task α is fixed to 0.5. The threshold for bind-relation prediction probability γ is fixed to 0.5. In addition, in line with the evaluation strategy of [21], 29 negative samples are randomly generate for each test triplet in four scenarios for the evaluation of all methods, and thus display the average metrics over all test triplets.