Continuous-Discrete Convolution for Geometry-Sequence Modeling in Proteins

Authors: Hehe Fan, Zhangyang Wang, Yi Yang, Mohan Kankanhalli

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on a range of tasks, including protein fold classification, enzyme reaction classification, gene ontology term prediction and enzyme commission number prediction, demonstrate the effectiveness of the proposed CDConv.
Researcher Affiliation Academia Hehe Fan1,3 , Zhangyang Wang2, Yi Yang1 & Mohan Kankanhalli3 1Zhejiang University 2The University of Texas at Austin 3National University of Singapore
Pseudocode No The paper describes the proposed method using text and mathematical equations but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes 1https://github.com/hehefan/Continuous-Discrete-Convolution
Open Datasets Yes We follow Hermosilla et al. (2021) to conduct protein fold classification on the training/validation/test splits of the SCOPe 1.75 data set of Hou et al. (2018), which in total contains 16,712 proteins with 1,195 fold classes. The 3D coordinates of the proteins were collected from the SCOPe 1.75 database (Murzin et al., 1995).
Dataset Splits Yes We use the dataset collected by Hermosilla et al. (2021), which includes 384 four-level EC classes and 29,215/2,562/5,651 proteins for training/validation/test, respectively.
Hardware Specification Yes Experiments are conducted on a single Nvidia Quadro RTX A5000 GPU and Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz.
Software Dependencies Yes Our model is implemented based on Py Torch-Geometric 2.0.4 and Py Torch 1.10.0 with CUDA 11.3.1 and cu DNN 8.2.0.
Experiment Setup Yes The number of CDConv layers h is set to 8. The sequential kernel size l is set to 11 for fold classification, 25 for reaction classification and 15 for GO term and EC number prediction. The initial radius ro is set to 4.