Categorical Flow Matching on Statistical Manifolds

Authors: Chaoran Cheng, Jiahan Li, Jian Peng, Ge Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on real-world generative tasks ranging from image, text to biological domains further demonstrate that SFM achieves higher sampling quality and likelihood than other discrete diffusion or flow-based models.
Researcher Affiliation Academia Chaoran Cheng University of Illinois Urbana-Champaign chaoran7@illinois.edu Jiahan Li Peking University lijiahanypc@pku.edu.cn Jian Peng University of Illinois Urbana-Champaign jianpeng@illinois.edu Ge Liu University of Illinois Urbana-Champaign geliu@illinois.edu
Pseudocode Yes The overall training and inference scheme is visualized in Fig.2 and described in Alg.2 and 3 in Appendix C. ... Algorithm 1 NLL Calculation for Discrete Data ... Algorithm 2 Training SFM ... Algorithm 3 Sampling from SFM
Open Source Code Yes Our code is available at https://github.com/ccr-cheng/ statistical-flow-matching.
Open Datasets Yes The binarized MNIST dataset [53] is the binarized version of the original MNIST dataset [34] by thresholding the original continuous value to be either 0 or 1, thus can be viewed as a 2-class generation task with a data dimension of 282 = 784. ... The Text8 dataset [41] is a medium-size character-level corpus consisting of a small vocabulary of 27, which includes the 26 lowercase letters and the whitespace token. ... [7] proposed a human promoter sequence dataset containing 100k promoter sequences with the corresponding transcription initiation signal profiles. ... The datasets used in this work are publicly available.
Dataset Splits Yes We used the preprocessed binarized MNIST dataset from [53] which has a split of 50k/10k/10k. ... We followed previous work [24, 6] to use a fixed split of 90M/5M/5M with a fixed sequence length of 256. ... We used the splits from the dataset paper [7] that assign Chromosome 10 to the validation set, Chromosomes 8 and 9 to the test set, and all the other 21 human chromosomes to the training set.
Hardware Specification Yes Each model for binarized MNIST and promoter design was trained on a single 80GB NVIDIA A100 GPU for 6-10 hours. Each model for Text8 was trained on four 80GB NVIDIA A100 GPUs for about 7 days.
Software Dependencies No The paper does not specify software dependencies with version numbers. It mentions using "Python 3.8" but no library versions (e.g., PyTorch version, CUDA version).
Experiment Setup Yes The quantitative evaluation of NLL and Fréchet inception distance (FID) are shown in Tab.1. ... All models were trained for 100k iterations with a batch size of 256 (approximately 510 epochs) with an initial learning rate of 3 10 4. ... The models were trained for a total number of 3M iterations with a batch size of 512 per GPU (approximately 16 epochs), an initial learning rate of 10 4, and an exponential moving average (EMA) decay rate of 0.9999. ... Our models were trained for 200k iterations with a batch size of 256 and an initial learning rate of 5 10 4.