Dynamics-Informed Protein Design with Structure Conditioning

Authors: Urszula Julia Komorowska, Simon V Mathis, Kieran Didi, Francisco Vargas, Pietro Lio, Mateja Jamnik

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the empirical effectiveness of our method by turning the open-source unconditional protein diffusion model Genie into a normal-mode-dynamics-conditional model with no retraining. Generated proteins exhibit the desired dynamical and structural properties while still being biologically plausible.
Researcher Affiliation Academia Urszula Julia Komorowska , Simon V Mathis , Kieran Didi, Francisco Vargas, Pietro Lio & Mateja Jamnik Department of Computer Science and Technology University of Cambridge Cambridge, CB30FD, UK {ujk21, svm34, ked48, fav25, pl219, mj201}@cam.ac.uk
Pseudocode No The paper references an algorithm (Algorithm 5 in Didi et al. (2023)) but does not include any pseudocode or algorithm blocks within its own text.
Open Source Code Yes We also make the code publicly available1. 1Code available at https://github.com/ujk21/dyn-informed.
Open Datasets Yes For our custom model training, we extract all short monomeric CATHv4.3 domains (Orengo et al., 1997) for structures with high resolution (< 3 A), of lengths between 21-112 amino acids, clustered 95% sequence similarity to remove redundancy. The resulting dataset contained 10037 protein structures.
Dataset Splits Yes For our custom model training, we extract all short monomeric CATHv4.3 domains (Orengo et al., 1997)... We extract random and strain dynamics targets from the proteins in the validation set.
Hardware Specification Yes The conditional Genie model was trained for 4 000 epochs on 4 A100 GPUs ( 300 A100 hours in total).
Software Dependencies No The paper mentions several software tools and libraries such as PyTorch, GVP, Protein MPNN, ESMFold, AlphaFold2, and Biotite extension Springcraft, but does not specify their version numbers.
Experiment Setup Yes We use the Hoogeboom schedule (Hoogeboom et al., 2022) with a 250-step DDPM discretisation scheme. The model was trained for 1000 epochs with a learning rate of 1e-4... The sampling process consisted of 250 reverse time steps... Guidance scales for strain targets were time-dependent and equal to 200αt for strain targets and 400αt for random targets. Conditioning was switched on in the middle of the generation process... We set η = 0.4... The guidance scales for the dynamics term and structure term were 3000 and 2500 for 6lys; 3000 and 2000 for 3adk; 2500 and 2000 for 2hhb. These constants were scaled by the time-dependent factors: αt for dynamics and 1.5 αt for structure.