Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design
Authors: Wengong Jin, Jeremy Wohlwend, Regina Barzilay, Tommi S. Jaakkola
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on three generation tasks, ranging from language modeling to SARS-CoV-2 neutralization optimization and antigen-binding antibody design. Our method is compared with a standard sequence model (Saka et al., 2021; Akbar et al., 2021) and a state-of-the-art graph generation method (You et al., 2018) tailored to antibodies. Our method not only achieves lower perplexity on test sequences but also outperforms previous baselines in property-guided antibody design tasks. |
| Researcher Affiliation | Academia | Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard * CSAIL, Massachusetts Institute of Technology |
| Pseudocode | Yes | Algorithm 1 Refine GNN decoding Algorithm 2 ITA-based sequence optimization |
| Open Source Code | Yes | Our code is available at https://github.com/wengong-jin/RefineGNN |
| Open Datasets | Yes | The Structural Antibody Database (SAb Dab) consists of 4994 antibody structures renumbered according to the IMGT numbering scheme (Lefranc et al., 2003). ... The Coronavirus Antibody Database (CoVAb Dab) contains 2411 antibodies, each associated with multiple binary labels indicating whether it neutralizes a coronavirus (SARS-CoV-1 or SARS-CoV-2) at a certain epitope. |
| Dataset Splits | Yes | First, we use MMseqs2 (Steinegger & Söding, 2017) to cluster all the CDR-H3 sequences. ... We then randomly split the clusters into training, validation, and test set with 8:1:1 ratio. We repeat the same procedure for creating CDR-H1 and CDR-H2 splits. In total, there are 1266, 1564, and 2325 clusters for CDR-H1, H2, and H3. The size of training, validation, and test sets for each CDR is shown in the appendix. ... For CDR-H1, the train/validation/test size is 4050, 359, and 326. For CDR-H2, the train/validation/test size is 3876, 483, and 376. For CDR-H3, the train/validation/test size is 3896, 403, and 437. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Hyperparameters. For Refine GNN, both its structure and sequence MPN have four message passing layers, with a hidden dimension of 256 and block size b = 4. All models are trained by the Adam optimizer with a learning rate of 0.001. More details are provided in the appendix. ... For AR-GNN and Refine GNN, we tried hidden dimension dh {128, 256} and number of message passing layers L {1, 2, 3, 4, 5}. We found dh = 256, L = 4 worked the best for Refine GNN and dh = 256, L = 3 worked the best for AR-GNN. For LSTM, we tried dh {128, 256, 512, 1024}. We found dh = 256 worked the best. All models are trained by an Adam optimizer with a dropout of 0.1 and a learning rate of 0.001. |