reproducibilityindex.ai

Protein Design with Guided Discrete Diffusion

Authors: Nate Gruver, Samuel Stanton, Nathan Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, Andrew G. Wilson

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply La MBO-2 to a realworld protein design task, optimizing antibodies for higher expression yield and binding affinity to several therapeutic targets under locality and developability constraints, attaining a 99% expression rate and 40% binding rate in exploratory in vitro experiments.
Researcher Affiliation	Collaboration	1New York University, {2Prescient Design, 3Antibody Engineering} Genentech.
Pseudocode	Yes	Algorithm 1 Infilling with categorical denoising diffusion model
Open Source Code	Yes	2https://github.com/ngruver/NOS
Open Datasets	Yes	All of our diffusion models are train on all paired heavy and light chain sequences from OAS [56] (p OAS) combined with all sequences from SAb Dab [25], aligned with ANARCI [24].
Dataset Splits	No	The paper mentions training datasets and evaluating on test sets, but it does not provide specific percentages or sample counts for training, validation, and test dataset splits, nor does it reference predefined splits with explicit citations that define these splits.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions software components such as 'Python', 'Biopython', 'Adam W', 'ANARCI', and 'Biacore Insight (Cytiva)', but it does not provide specific version numbers for these software dependencies, which is necessary for reproducible setup.
Experiment Setup	Yes	The full hyperparameter settings for both objectives (beta sheets and SASA) and both corruption types (NOS-D and NOS-C) are shown in Table 2. In Table 2, there is an additional hyperparameter, guidance layer , which we did not discuss at length in the main text of the paper. This parameter dictates whether we perform guidance in the first layer of the neural network (the token embeddings), as is standard in continuous diffusion models for discrete sequences, or the final layer of the neural network (the layer before the final linear head). In either case, we can use the same gradient descent objective and corruption process in each case and need only change the variable we propagate gradient updates to. Table 2 shows the hyperparameters used in the just Figure 5. We train for 100 epochs with a batch size of 64, optimizing with Adam W using an initial learning rate of 5e-3 with a linear warmup.