Discrete Flow Matching
Authors: Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the tasks of language modeling, code generation, and image generation. For language modeling, we compare the proposed method against prior work considering the widely used generative perplexity metric. We scale the models to 1.7 billion parameters and present results on coding tasks, i.e., Human Eval (Chen et al., 2021), MBPP (Austin et al., 2021b), demonstrating the most promising results to date in a non-autoregressive context. In image generation, we present results for a fully discrete CIFAR10 (Krizhevsky et al., 2009). |
| Researcher Affiliation | Collaboration | Itai Gat1 Tal Remez1 Neta Shaul2 Felix Kreuk1 Ricky T. Q. Chen1 Gabriel Synnaeve1 Yossi Adi1 Yaron Lipman1 1 Meta FAIR 2 Weizmann Institute of Science |
| Pseudocode | Yes | Algorithm 1 formulates a basic sampling algorithm given a generating probability velocity ut. Algorithm 1 Flow Matching sampling. Require: velocity ut, sample X p, step size h = 1 n for t = 0, h, 2h, . . . , 1 h do Xi δXi( ) + hui t( , X), for i [N] eq. 24 or 22 end for return X |
| Open Source Code | Yes | 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] |
| Open Datasets | Yes | Data. We use three splits of data. First is Open Web Text (Gokaslan and Cohen, 2019). Second is the same mix used in Llama-2 (Touvron et al., 2023), including textual and code data. For the code-focused models we use the same split used in Code Llama (Roziere et al., 2023). For the small models, we use Open Web Text. For the big models we use the Llama-2 and Code Llama mixes. ... We performed a fully discrete image generation... We trained an FM model with U-coupling and path as in equation 9 on CIFAR10 |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits using percentages, sample counts, or references to predefined splits with citations. It mentions training data and evaluating on benchmarks, but without specific split details. |
| Hardware Specification | Yes | To demonstrate that, we measure the average latency of the proposed method compared with the autoregressive alternative using a single A100 GPU with 80 GB of RAM. |
| Software Dependencies | No | The paper mentions software components like “Di T transformer architecture”, “GPT2 tokenizer”, “tiktoken tokenizer”, “ROPE embedding”, “Adam optimizer”, and “U-Net architecture” but does not provide specific version numbers for these, which is required for reproducibility. |
| Experiment Setup | Yes | Models are trained with Adam optimizer with β1 = 0.9 and β2 = 0.999. We use dropout rate of 0.1. Models are trained with a warm-up of 2500 steps, with a peak learning rate of 3e-4. We train the big models with batch size of 4096 for 1.3 million iterations and the big models with batch size of 512 for 400 thousand iterations. ... We optimize the network using Adam optimizer with β1 = 0.9 and β2 = 0.999, a learning rate of 1e-4. We trained with an effective batch size pf 512 for roughly 300K iterations. |