reproducibilityindex.ai

Diffusion Language Models Are Versatile Protein Learners

Authors: Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate DPLM on extensive generative and understanding tasks, spanning unconditional generation ( 4.1), a variety of protein predictive downstream tasks ( 4.2), and conditional tasks, including motif-scaffolding ( 4.3.1), inversefolding task ( 4.3.2), and secondary structure guided controllable generation ( 4.3.3).
Researcher Affiliation	Collaboration	Dept. of Computer Science, Nanjing University (this work was done during Xinyou s internship at Byte Dance Research) Byte Dance Research.
Pseudocode	Yes	Algorithm 1 Sampling from RDM
Open Source Code	No	The paper does not provide an explicit statement of open-source code release for DPLM or a direct link to its repository.
Open Datasets	Yes	The pre-training procedure for DPLM utilizes the Uni Ref50 database (Suzek et al., 2015), which comprises around 45 million protein sequences, totaling about 14 billion amino acid tokens.
Dataset Splits	No	The paper mentions using Uni Ref50 for pre-training and fine-tuning on various downstream tasks (Thermostability, Metal Ion Binding, Deep Loc, EC, GO, Human PPI) which are often standard benchmarks, but it does not explicitly provide the training, validation, and test splits used for these experiments within the paper's text.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types) used for running its experiments.
Software Dependencies	No	The paper mentions various software components and models like 'ESMFold' and 'Omega Fold' which were used, but it does not provide specific version numbers for these or other software dependencies required to reproduce the experiments.
Experiment Setup	Yes	We train all models for 100K updates, with batch size of 320K for 150M model and 1M for 650M/3B models.