reproducibilityindex.ai

GFlowNet Assisted Biological Sequence Editing

Authors: Pouya M. Ghari, Alex Tseng, Gokcen Eraslan, Romain Lopez, Tommaso Biancalani, Gabriele Scalia, Ehsan Hajiramezanali

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted extensive experiments on a range of real-world datasets and biological applications, and our results underscore the superior performance of our proposed algorithm compared to existing state-of-the-art sequence editing methods.
Researcher Affiliation	Collaboration	Pouya M. Ghari University of California Irvine Alex M. Tseng Genentech Gökcen Eraslan Genentech Romain Lopez Genentech, Stanford University Tommaso Biancalani Genentech Gabriele Scalia Genentech Ehsan Hajiramezanali Genentech
Pseudocode	Yes	Algorithm 1 GFNSeq Editor: Sequence Editor using GFlow Net
Open Source Code	No	We will release the code as well upon company approval.
Open Datasets	Yes	TFbinding: The dataset is taken from [4] and contains all possible DNA sequences with length 8. AMP: The dataset, acquired from DBAASP [40], is curated following the approach outlined by [19]. CRE: The dataset contains putative human cis-regulatory elements (CRE) which are regulatory DNA sequences modulating gene expression. CREs were profiled via massively parallel reporter assays (MPRAs)[12]...
Dataset Splits	Yes	To train both the baselines and the proposed GFNSeq Editor, we divide each dataset into training, validation, and test sets with proportions of 72%, 18% and 10%, respectively.
Hardware Specification	Yes	All training and inferences, including GFNSeq Editor, have been conducted using a single Nvidia Quadro P6000.
Software Dependencies	No	The paper mentions software like PyTorch and Adam optimizer, but does not provide specific version numbers for any libraries or dependencies.
Experiment Setup	Yes	The flow function Fθ( ) utilized by GFNSeq Editor and the GFlow Net-E baseline is an MLP consisting of two hidden layers, each with a dimension of 2048, and \|A\| outputs corresponding to actions. ... Adam optimizer with (β0, β1) = (0.9, 0.999) is utilized during the training process. The learning rate for log Z in trajectory balance loss is set to 10 3 for all the experiments. The number of training steps for TFbinding, AMP and CRE are 5000, 106 and 104, respectively.