Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design

Authors: Wengong Jin, Jeremy Wohlwend, Regina Barzilay, Tommi S. Jaakkola

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on three generation tasks, ranging from language modeling to SARS-CoV-2 neutralization optimization and antigen-binding antibody design. Our method is compared with a standard sequence model (Saka et al., 2021; Akbar et al., 2021) and a state-of-the-art graph generation method (You et al., 2018) tailored to antibodies. Our method not only achieves lower perplexity on test sequences but also outperforms previous baselines in property-guided antibody design tasks.
Researcher Affiliation Academia Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard * CSAIL, Massachusetts Institute of Technology
Pseudocode Yes Algorithm 1 Refine GNN decoding Algorithm 2 ITA-based sequence optimization
Open Source Code Yes Our code is available at https://github.com/wengong-jin/RefineGNN
Open Datasets Yes The Structural Antibody Database (SAb Dab) consists of 4994 antibody structures renumbered according to the IMGT numbering scheme (Lefranc et al., 2003). ... The Coronavirus Antibody Database (CoVAb Dab) contains 2411 antibodies, each associated with multiple binary labels indicating whether it neutralizes a coronavirus (SARS-CoV-1 or SARS-CoV-2) at a certain epitope.
Dataset Splits Yes First, we use MMseqs2 (Steinegger & Söding, 2017) to cluster all the CDR-H3 sequences. ... We then randomly split the clusters into training, validation, and test set with 8:1:1 ratio. We repeat the same procedure for creating CDR-H1 and CDR-H2 splits. In total, there are 1266, 1564, and 2325 clusters for CDR-H1, H2, and H3. The size of training, validation, and test sets for each CDR is shown in the appendix. ... For CDR-H1, the train/validation/test size is 4050, 359, and 326. For CDR-H2, the train/validation/test size is 3876, 483, and 376. For CDR-H3, the train/validation/test size is 3896, 403, and 437.
Hardware Specification No The paper does not specify the hardware used for experiments.
Software Dependencies No The paper mentions 'Adam optimizer' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes Hyperparameters. For Refine GNN, both its structure and sequence MPN have four message passing layers, with a hidden dimension of 256 and block size b = 4. All models are trained by the Adam optimizer with a learning rate of 0.001. More details are provided in the appendix. ... For AR-GNN and Refine GNN, we tried hidden dimension dh {128, 256} and number of message passing layers L {1, 2, 3, 4, 5}. We found dh = 256, L = 4 worked the best for Refine GNN and dh = 256, L = 3 worked the best for AR-GNN. For LSTM, we tried dh {128, 256, 512, 1024}. We found dh = 256 worked the best. All models are trained by an Adam optimizer with a dropout of 0.1 and a learning rate of 0.001.