GFlowNet Assisted Biological Sequence Editing

Authors: Pouya M. Ghari, Alex Tseng, Gokcen Eraslan, Romain Lopez, Tommaso Biancalani, Gabriele Scalia, Ehsan Hajiramezanali

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments on a range of real-world datasets and biological applications, and our results underscore the superior performance of our proposed algorithm compared to existing state-of-the-art sequence editing methods.
Researcher Affiliation Collaboration Pouya M. Ghari University of California Irvine Alex M. Tseng Genentech Gökcen Eraslan Genentech Romain Lopez Genentech, Stanford University Tommaso Biancalani Genentech Gabriele Scalia Genentech Ehsan Hajiramezanali Genentech
Pseudocode Yes Algorithm 1 GFNSeq Editor: Sequence Editor using GFlow Net
Open Source Code No We will release the code as well upon company approval.
Open Datasets Yes TFbinding: The dataset is taken from [4] and contains all possible DNA sequences with length 8. AMP: The dataset, acquired from DBAASP [40], is curated following the approach outlined by [19]. CRE: The dataset contains putative human cis-regulatory elements (CRE) which are regulatory DNA sequences modulating gene expression. CREs were profiled via massively parallel reporter assays (MPRAs)[12]...
Dataset Splits Yes To train both the baselines and the proposed GFNSeq Editor, we divide each dataset into training, validation, and test sets with proportions of 72%, 18% and 10%, respectively.
Hardware Specification Yes All training and inferences, including GFNSeq Editor, have been conducted using a single Nvidia Quadro P6000.
Software Dependencies No The paper mentions software like PyTorch and Adam optimizer, but does not provide specific version numbers for any libraries or dependencies.
Experiment Setup Yes The flow function Fθ( ) utilized by GFNSeq Editor and the GFlow Net-E baseline is an MLP consisting of two hidden layers, each with a dimension of 2048, and |A| outputs corresponding to actions. ... Adam optimizer with (β0, β1) = (0.9, 0.999) is utilized during the training process. The learning rate for log Z in trajectory balance loss is set to 10 3 for all the experiments. The number of training steps for TFbinding, AMP and CRE are 5000, 106 and 104, respectively.