Dirichlet Diffusion Score Model for Biological Sequence Generation
Authors: Pavel Avdeyev, Chenlai Shi, Yuhao Tan, Kseniia Dudnyk, Jian Zhou
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that this technique can generate samples that satisfy hard constraints using a Sudoku generation task. This generative model can also solve Sudoku, including hard puzzles, without additional training. Finally, we applied this approach to develop the first human promoter DNA sequence design model and showed that designed sequences share similar properties with natural promoter sequences. |
| Researcher Affiliation | Academia | 1Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, USA. |
| Pseudocode | No | The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | Code available at https://github.com/jzhoulab/ddsm |
| Open Datasets | Yes | FANTOM CAGE datasets were downloaded from https://fantom.gsc.riken.jp/5/datafiles/latest/. ... The human genome sequences are retrieved from hg38 |
| Dataset Splits | Yes | The promoters are further split into the training, validation, and test sets based on chromosomes (chr8 and 9 for the test set, chr10 for the validation set, and all other chromosomes for the training set). |
| Hardware Specification | No | The paper mentions software like PyTorch but does not specify any particular hardware components such as GPU or CPU models used for running the experiments. |
| Software Dependencies | Yes | Table 4. Runtime of Jacobi diffusion density function computation on Py Torch 1.10.1. |
| Experiment Setup | Yes | The Sudoku transformer is a 20-block transformer architecture... For generation and solving Sudoku puzzles, we used Euler Maruyama sampler... 100k steps where k is the time-dilation factor are used. ... The Promoter Designer model has a custom-designed 1D convolutional architecture. ... The training uses s = 2 a+b Jacobi diffusion processes with maximum time 4. For sampling from the trained model, we used Euler Maruyama sampler with 100 steps. ... In the training set, we also introduce the same amount of random shift of up to +/- 100bp to the sequence and transcription initiation proļ¬le simultaneously. |