reproducibilityindex.ai

Protein Discovery with Discrete Walk-Jump Sampling

Authors: Nathan C. Frey, Dan Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, Saeed Saremi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 70% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We validate our method with in vitro experiments.
Researcher Affiliation	Collaboration	1Prescient Design, Genentech 2Antibody Engineering, Genentech 3Department of Computer Science, New York University 4Center for Data Science, New York University
Pseudocode	Yes	Algorithm 1: Discrete Walk-Jump Sampling
Open Source Code	Yes	1https://github.com/prescient-design/walk-jump
Open Datasets	Yes	Sequences from the Observed Antibody Space (OAS) database (Olsen et al., 2022) are aligned according to the AHo numbering scheme (Honegger & PluÈckthun, 2001) using the ANARCI (Dunbar & Deane, 2016) package and one-hot encoded. Our model is trained only on the publicly available (Mason et al., 2021) dataset.
Dataset Splits	Yes	To avoid overfitting the estimator, we split the reference set into a fitting set and a validation set (Algo. 2). Sequence property metrics are condensed into a single scalar metric by computing the distributional conformity score and the normalized average Wasserstein distance Wproperty between the property distributions of samples and a validation set.
Hardware Specification	No	The paper mentions 'GPU time / sample' and 'GPU memory (MB)' in Table 7 but does not provide specific details on the hardware used, such as exact GPU or CPU models.
Software Dependencies	Yes	All models were trained with the Adam W (Loshchilov & Hutter, 2017) optimizer in Py Torch (Paszke et al., 2019).
Experiment Setup	Yes	We used a batch size of 256, an initial learning rate of 1 × 10−4, and trained with early stopping.