Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Context-Aware Regularization with Markovian Integration for Attention-Based Nucleotide Analysis

Authors: Mohammad Saleh Refahi, Mahdi Abavisani, Bahrad Sokhansanj, James R Brown, Gail Rosen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CARMANIA across diverse genomic tasks, including regulatory element prediction, functional gene classification, taxonomic inference, antimicrobial resistance detection, and biosynthetic gene cluster classification. CARMANIA outperforms the previous best long-context model by at least 7%, matches state-of-the-art on shorter sequences (exceeding prior results on 20/40 tasks while running 2.5 faster), and shows particularly strong improvements on enhancer and housekeeping gene classification tasks including up to a 34% absolute gain in Matthews correlation coefficient (MCC) for enhancer prediction.
Researcher Affiliation	Collaboration	Mohammadsaleh Refahi Drexel University Philadelphia, PA Mahdi Abavisani Dataminr New York, NY Bahrad A. Sokhansanj Drexel University Philadelphia, PA James R. Brown Drexel University Philadelphia, PA Gail Rosen Drexel University Philadelphia, PA
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. Figure 1 illustrates the proposed pretraining framework diagrammatically, but it is not pseudocode.
Open Source Code	Yes	Code available at https://github.com/EESI/carmania.
Open Datasets	Yes	We pre-train our model on three large-scale genomic datasets: (i) GRCh38 GRCh38 [2013], which provides 3B base pairs with 10 kbp and 160 kbp fragments; (ii) the Basic Genome Dataset Zhu et al. [2022], consisting of 10B base pairs across 4,600+ genomes with 10 kbp fragments; and (iii) the Scorpio Gene-Taxa Dataset Refahi et al. [2025], comprising 580M base pairs from 2,046 genomes with 4 kbp fragments. Additional details are in Appendix A.
Dataset Splits	Yes	To evaluate task-specific adaptation, we fine-tuned our model on diverse genomics classification datasets. The Genomic Benchmarks collection includes tasks such as regulatory element prediction, enhancer detection, and binary species classification Grešová et al. [2023]. For a fair comparison, we fine-tuned the human-pretrained CARMANIA model using 5-fold cross-validation and compared its performance to other state-of-the-art models trained on the human genome.
Hardware Specification	Yes	The model was trained for two epochs using Py Torch on an NVIDIA A100 GPU (80GB).
Software Dependencies	No	The paper mentions 'Py Torch' as the framework used for training but does not provide a specific version number for it or any other key software libraries.
Experiment Setup	Yes	Architecture: CARMANIA is a LLa MA-based causal Transformer tailored for genomic sequence modeling. It uses 16 attention heads (4 key-value), a window size of 128, and 5 custom Transformer layers with an embedding size of 1024 and intermediate dimension of 4608. The model uses Si LU activations and contains 83M parameters. Training Parameters, and Hardware Specifics: The model was trained for two epochs using Py Torch on an NVIDIA A100 GPU (80GB). To fit within memory constraints, batch sizes were set based on sequence length: 35 for 4 kbp, 19 for 10 kbp, and 1 for 160 kbp inputs. We used a cosine annealing schedule with an initial learning rate of 5e-4. Additional training details, including optimizer settings and gradient clipping, are listed in Supplementary Table 13.