Dynamics-Informed Protein Design with Structure Conditioning
Authors: Urszula Julia Komorowska, Simon V Mathis, Kieran Didi, Francisco Vargas, Pietro Lio, Mateja Jamnik
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the empirical effectiveness of our method by turning the open-source unconditional protein diffusion model Genie into a normal-mode-dynamics-conditional model with no retraining. Generated proteins exhibit the desired dynamical and structural properties while still being biologically plausible. |
| Researcher Affiliation | Academia | Urszula Julia Komorowska , Simon V Mathis , Kieran Didi, Francisco Vargas, Pietro Lio & Mateja Jamnik Department of Computer Science and Technology University of Cambridge Cambridge, CB30FD, UK {ujk21, svm34, ked48, fav25, pl219, mj201}@cam.ac.uk |
| Pseudocode | No | The paper references an algorithm (Algorithm 5 in Didi et al. (2023)) but does not include any pseudocode or algorithm blocks within its own text. |
| Open Source Code | Yes | We also make the code publicly available1. 1Code available at https://github.com/ujk21/dyn-informed. |
| Open Datasets | Yes | For our custom model training, we extract all short monomeric CATHv4.3 domains (Orengo et al., 1997) for structures with high resolution (< 3 A), of lengths between 21-112 amino acids, clustered 95% sequence similarity to remove redundancy. The resulting dataset contained 10037 protein structures. |
| Dataset Splits | Yes | For our custom model training, we extract all short monomeric CATHv4.3 domains (Orengo et al., 1997)... We extract random and strain dynamics targets from the proteins in the validation set. |
| Hardware Specification | Yes | The conditional Genie model was trained for 4 000 epochs on 4 A100 GPUs ( 300 A100 hours in total). |
| Software Dependencies | No | The paper mentions several software tools and libraries such as PyTorch, GVP, Protein MPNN, ESMFold, AlphaFold2, and Biotite extension Springcraft, but does not specify their version numbers. |
| Experiment Setup | Yes | We use the Hoogeboom schedule (Hoogeboom et al., 2022) with a 250-step DDPM discretisation scheme. The model was trained for 1000 epochs with a learning rate of 1e-4... The sampling process consisted of 250 reverse time steps... Guidance scales for strain targets were time-dependent and equal to 200αt for strain targets and 400αt for random targets. Conditioning was switched on in the middle of the generation process... We set η = 0.4... The guidance scales for the dynamics term and structure term were 3000 and 2500 for 6lys; 3000 and 2000 for 3adk; 2500 and 2000 for 2hhb. These constants were scaled by the time-dependent factors: αt for dynamics and 1.5 αt for structure. |