GFlowNet Assisted Biological Sequence Editing
Authors: Pouya M. Ghari, Alex Tseng, Gokcen Eraslan, Romain Lopez, Tommaso Biancalani, Gabriele Scalia, Ehsan Hajiramezanali
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted extensive experiments on a range of real-world datasets and biological applications, and our results underscore the superior performance of our proposed algorithm compared to existing state-of-the-art sequence editing methods. |
| Researcher Affiliation | Collaboration | Pouya M. Ghari University of California Irvine Alex M. Tseng Genentech Gökcen Eraslan Genentech Romain Lopez Genentech, Stanford University Tommaso Biancalani Genentech Gabriele Scalia Genentech Ehsan Hajiramezanali Genentech |
| Pseudocode | Yes | Algorithm 1 GFNSeq Editor: Sequence Editor using GFlow Net |
| Open Source Code | No | We will release the code as well upon company approval. |
| Open Datasets | Yes | TFbinding: The dataset is taken from [4] and contains all possible DNA sequences with length 8. AMP: The dataset, acquired from DBAASP [40], is curated following the approach outlined by [19]. CRE: The dataset contains putative human cis-regulatory elements (CRE) which are regulatory DNA sequences modulating gene expression. CREs were profiled via massively parallel reporter assays (MPRAs)[12]... |
| Dataset Splits | Yes | To train both the baselines and the proposed GFNSeq Editor, we divide each dataset into training, validation, and test sets with proportions of 72%, 18% and 10%, respectively. |
| Hardware Specification | Yes | All training and inferences, including GFNSeq Editor, have been conducted using a single Nvidia Quadro P6000. |
| Software Dependencies | No | The paper mentions software like PyTorch and Adam optimizer, but does not provide specific version numbers for any libraries or dependencies. |
| Experiment Setup | Yes | The flow function Fθ( ) utilized by GFNSeq Editor and the GFlow Net-E baseline is an MLP consisting of two hidden layers, each with a dimension of 2048, and |A| outputs corresponding to actions. ... Adam optimizer with (β0, β1) = (0.9, 0.999) is utilized during the training process. The learning rate for log Z in trajectory balance loss is set to 10 3 for all the experiments. The number of training steps for TFbinding, AMP and CRE are 5000, 106 and 104, respectively. |