Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Omni-DNA: A Genomic Model Supporting Sequence Understanding, Long-context, and Textual Annotation

Authors: Zehui Li, Vallijah Subasri, Yifei Shen, Dongsheng Li, Wentao Gu, Guy-Bart Stan, Yiren Zhao, Caihua Shan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Omni-DNA establishes new state-of-the-art results on 18 of 26 evaluations drawn from Nucleotide Transformer and Genomic Benchmarks. When jointly finetuning on biologically related tasks, Omni-DNA consistently outperforms existing models and demonstrates multi-tasking abilities. Furthermore, we introduce SEQPACK, an adaptive compression mechanism that enables efficient long-context modeling by summarizing historical tokens through position-aware learnable sampling. Leveraging SEQPACK, Omni DNA excels at enhancer target interaction prediction, capturing distal regulatory effects over 450kbp. Finally, we present SEQ2FUNC, a newly constructed dataset that empowers Omni-DNA to generate accurate and functionally meaningful interpretations of DNA sequences, opening new avenues for genomic analysis and discovery. We benchmark Omni-DNA integrated with SEQPACK against leading genomic foundation models across four types of representative tasks. (i) Gene regulatory element classification; (ii) Pathogenic variant effect prediction; (iii) Enhancer target gene interaction mapping; and (iv) Sequence-conditioned functional interpretation generation.
Researcher Affiliation	Collaboration	Zehui Li1,2,, Vallijah Subasri3,4, Yifei Shen2, Dongsheng Li2, Wentao Gu2, Guy-Bart Stan1, , Yiren Zhao1, , Caihua Shan2, 1Imperial College London 2Microsoft Research 3Vector Institute 4University Health Network *Correspondence to: {EMAIL, EMAIL} Co-corresponding authors: {EMAIL, EMAIL}
Pseudocode	Yes	Algorithm 1 SEQPACK compression layer (inference mode) Algorithm 2 Differential Classification Wrapper
Open Source Code	Yes	We provide the full code we use for performing the downstream tasks in the supplementary.
Open Datasets	Yes	Omni-DNA establishes new state-of-the-art results on 18 of 26 evaluations drawn from Nucleotide Transformer and Genomic Benchmarks. We pre-trained with causal language modeling on 300 billion nucleotides from Ref Seq [33]. We introduce SEQPACK. Finally, we present SEQ2FUNC, a newly constructed dataset that empowers Omni-DNA to generate accurate and functionally meaningful interpretations of DNA sequences. We adopt the curated benchmark from Cheng et al. [7] originated from [11, 12, 34]. Starting from the full Clin Var release (14810 unique diseases and 3 M variants).
Dataset Splits	Yes	For NT Downstream tasks, we use a maximum fine-tuning epoch of 20, while for the Genomic Benchmark (GB) tasks, we use a maximum of 10 epochs. Both NT Downstream and GB tasks include a training set and a test set. For hyperparameter search, 10% of the training set is reserved as a validation set. The data are split 8:1:1 into training, validation, and test sets. We follow an 80/10/10 train/validation/test split.
Hardware Specification	Yes	Each model processes 250-token contexts (~1kbp) in batches of 384 sequences across 8 A100-40 GB GPUs for 800k steps. All fine-tuning is conducted on a single NVIDIA A100 40GB GPU. We declare the use of computational resource for each experiment. Mostly on A100 Nvidia GPU 40GB8 for pretraining, and A100 Nvidia GPU 80GB1 for finetuning.
Software Dependencies	No	We also employ mixed precision (bf16 matmuls, fp32 norm stats), global gradient clipping at 1.0, and Py Torch FSDP layer sharding. We utilize the Hugging Face Trainer API for full-size fine-tuning of the pretrained checkpoints. Models are loaded using the Auto Model For Sequence Classification class, which automatically adds a linear layer on top for sequence classification. The text mentions PyTorch and Hugging Face Trainer API, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Omni-DNA is pre-trained with causal next-token prediction objective on a 300-billion nucleotide corpus drawn from NCBI s multi-species assemblies [33]. Following Zhou et al. [48], we adopt Byte-Pair Encoding (BPE) [35] with an initial vocabulary of 4096. Each model processes 250-token contexts (~1kbp) in batches of 384 sequences across 8 A100-40 GB GPUs for 800k steps. The optimization is performed by Adam W (β1 = 0.9, β2 = 0.95, weight-decay=0.1) with a linear warm-up of 5k steps to a peak learning rate of 3x10-4, followed by linear decay to 3x10-5. We also employ mixed precision (bf16 matmuls, fp32 norm stats), global gradient clipping at 1.0, and Py Torch FSDP layer sharding. For NT Downstream tasks, we use a maximum fine-tuning epoch of 20, while for the Genomic Benchmark (GB) tasks, we use a maximum of 10 epochs. 10-fold cross validation is performed for each model to decide the best hyperparameters provided in table 8.