Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling
Authors: Dongyi Wang, Yuanwei Jiang, Zhenyi Zhang, Xiang Gu, Peijie Zhou, Jian Sun
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on both synthetic and real datasets demonstrate that VGFM can capture the underlying biological dynamics accounting for mass and state variations over time, outperforming existing approaches for single-cell dynamics modeling. Our code is available at https://github.com/Dongyi Wang-66/VGFM. Extensive experiments on both synthetic and real single-cell datasets are conducted to evaluate our proposed approach. |
| Researcher Affiliation | Academia | 1School of Mathematics and Statistics, Xi an Jiaotong University, Xi an, China 2LMAM and School of Mathematical Sciences, Peking University, Beijing, China. 3Center for Machine Learning Research, Peking University, Beijing, China. EMAIL; EMAIL; EMAIL; EMAIL |
| Pseudocode | Yes | Algorithm 1: Training algorithm of joint velocity-growth flow matching |
| Open Source Code | Yes | Our code is available at https://github.com/Dongyi Wang-66/VGFM. |
| Open Datasets | Yes | Synthetic datasets. Inspired by [23, 26], we adopt the Simulation Gene dataset... We also use Dyngen [62] to simulate a sc RNA-seq dataset... Additionally, inspired by [26] that employs a high-dimensional Gaussian mixture model [64]... Real-world dataset. We conduct experiments on three real-world datasets, Embryoid Body (EB) [65], CITE-seq (CITE) [66] and Pancreas [67] |
| Dataset Splits | Yes | EB (5D), CITE (5D), and CITE (50D) configurations are assessed using a hold-out strategy, same as [38], in which an intermediate time point is excluded during training. The model is then used to predict the distribution at the hold-out time, and we compute the W1 distance between the predicted and true distributions at that time point. |
| Hardware Specification | Yes | All experiments are performed on a single-core CPU without GPU acceleration, and all visualizations are based on projections onto the first two dimensions of the high-dimensional data. |
| Software Dependencies | No | The paper mentions using the "pot library" and the "geomloss library" but does not specify any version numbers for these software components. Therefore, it does not provide specific ancillary software details needed to replicate the experiment in a fully reproducible manner. |
| Experiment Setup | Yes | We employ a 3-layer (5-layer for dimensions greater than 50) MLP with 256 hidden units and Leaky ReLU activation to parameterize both the velocity field vĪø and the growth function gĻ. Optimization is performed using the Adam optimizer with a learning rate of 10-3 at warm-up stage and 10-4 after applying distribution fitting loss. A warm-up stage of 500 iterations for synthetic datasets and 5000 iterations for real-world dataset is applied (only matching loss in Eq. (12) ), after which the distribution fitting loss LOT defined in Eq. (13) is applied for an additional 30 training epochs. The mini-batch size is set to 256 for all experiments, except for Dyngen, where we use a smaller batch size of 60 due to its limited number of samples. |