Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hierarchical Gradient-Based Genetic Sampling for Accurate Prediction of Biological Oscillations

Authors: Heng Rao, Yu Gu, Jason Zipeng Zhang, Ge Yu, Yang Cao, Minghan Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that HGGS outperforms seven comparative sampling methods across four biological systems, highlighting its effectiveness in enhancing sampling and prediction accuracy. The paper includes sections like '5 Experiments', 'Results', 'Ablation Study', and 'Sensitivity Analysis'.
Researcher Affiliation	Academia	1College of Computer Science and Engineering, Northeastern University, Shenyang, China 2Department of Computer Science, Wake Forest University, Winston-Salem, NC, USA 3Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: HGGS for predicting biological oscillations Input: NN model fnn(λ; Θ), neighbors for gradient estimation K, initial sample size N, filtering ratio r, sampling cycle mc, Multigrid Genetic Sampling budget {nv1, nv2} Initialization: LHS SΩ= {(λ(i), y(i))}N i=1 Output: Target model fnn(λ; Θ ) 1: Apply Gradient-based Filtering to SΩto generate S(0) Ω: S(0) Ω GF(SΩ, r, K) where S(0) Ω SΩ 2: Update fnn(λ; Θ) by minimizing L\|S(0) Ω\| in Eq. 3 3: for k = 0, 1, . . . , mc 1 do 4: Compute residual l = {\|fnn(λ(i); Θ(k)) y(i)\|} \|S(k) Ω\| i=1 5: Stratify S(k) Ω into 3 subdomains based on residual using Gaussian Mixture: {S(k) Ωlr, S(k) Ωmr, S(k) Ωhr} 6: S(k+1) Ωgs MGS(nv1, nv2, S(k) Ωmr, S(k) Ωhr) 7: S(k+1) Ω S(k) Ω S(k+1) Ωgs // Update datasets 8: Update fnn(λ; Θ) by minimizing L\|S(k+1) Ω \| in Eq. 3 9: end for 10: return fnn(λ; Θ )
Open Source Code	No	The paper does not contain an explicit statement about the release of open-source code, nor does it provide a link to a code repository.
Open Datasets	Yes	Benchmark datasets from four biological systems were used for method evaluation: the Brusselator system (Prigogine 1978), the Cell Cycle system (Liu et al. 2012), the Mitotic Promoting Factor (MPF) system (Novak and Tyson 1993), and the Activator Inhibitor system (Murray 2002). For each system, we generated 20k 70k sets of system coefficients using LHS, ran simulations to produce system dynamics, and determined the oscillatory frequency using (Apicella et al. 2013). This data was then used for training and testing of our method. Descriptions of the four biological systems, their corresponding ODEs, and detailed simulation settings are provided in the Appendix.
Dataset Splits	Yes	We utilized N = 10k samples for initial training and 5k samples for validation for each experiment. For a thorough evaluation, our testing data consists of four subsets, characterizing different types of the coefficient domain: overall (entire testing data), majority (non-oscillatory samples only), minority (oscillatory samples only), and boundary (top 20% samples ranked by gradient using Eq. 4). The total size of the testing data varies between 7k 60k, depending on the oscillatory systems.
Hardware Specification	Yes	Our algorithm was implemented using the Py Torch framework on a single NVIDIA A6000 GPU.
Software Dependencies	No	The paper states, 'Our algorithm was implemented using the Py Torch framework', but does not specify a version number for PyTorch or any other software dependencies with version numbers.
Experiment Setup	Yes	The neural network (Multi-Layer Perceptron), consisting of 3 or 4 hidden layers, was trained for 3k epochs per sampling cycle using the Adam optimizer with a learning rate of 2 2.5 10 3, employing full batch training and early stopping. For key hyperparameters, the GF filtering ratio was set to r = 20%, with K = 5 nearest neighbors and a GF sample size of nf = N/2. During the sampling cycles, the MGS ratio was set to nv1 : nv2 = 6 : 4, with an MGS sample size of ns = N/2.