Self-Consistent Velocity Matching of Probability Flows
Authors: Lingxiao Li, Samuel Hurault, Justin M. Solomon
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, our method recovers analytical solutions accurately when they are available and achieves superior performance in high dimensions with less training time compared to alternatives. |
| Researcher Affiliation | Academia | Lingxiao Li MIT CSAIL lingxiao@mit.edu Samuel Hurault Univ. Bordeaux, Bordeaux INP, CNRS, IMB samuel.hurault@math.u-bordeaux.fr Justin Solomon MIT CSAIL jsolomon@mit.edu |
| Pseudocode | Yes | Our algorithm is summarized in Algorithm 1 in the appendix. |
| Open Source Code | No | The paper states 'We implemented our method using JAX [Bradbury et al., 2018] and FLAX [Heek et al., 2020]. See Appendix C for the implementation details.' and mentions basing ICNN implementation on a third-party codebase. However, it does not explicitly provide a link to its own open-source code for the methodology described. |
| Open Datasets | Yes | We use the first three images of 1, 2, 3 from the MNIST dataset. |
| Dataset Splits | No | The paper does not specify explicit training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'JAX [Bradbury et al., 2018]', 'FLAX [Heek et al., 2020]', 'Diffrax library [Kidger, 2021]', 'JAXopt [Blondel et al., 2021]', and 'Adam [Kingma and Ba, 2014]'. While libraries are named, specific version numbers (e.g., JAX v0.x.x, FLAX v0.y.y) are not provided in the text; only publication years of their corresponding papers are given. |
| Experiment Setup | Yes | Unless mentioned otherwise, we choose the following hyperparameters for Algorithm 1. We set Ntrain = 105 or 2 105, B = 1000, L = 10 or 20. We use Adam [Kingma and Ba, 2014] with a cosine decay learning rate scheduler, with initial learning rate 10 3, the number of decay steps same as Ntrain, and α = 0.01 (so the final learning rate is 10 5). Since we are effectively performing gradient descent using a biased gradient, we set b2 = 0.9 in Adam (instead of the default b2 = 0.999), so that the statistics in Adam can be updated more quickly; we found this tweak improves the results noticeably. The numerical integration for NODE is done using Diffrax library [Kidger, 2021] with a relative and absolute tolerance of 10 4; For ICNNs, we use hidden layer sizes 64, 128, 128, 64. The quadratic rank for the convex quadratic skip connections is set to 20. The activation layer is taken to be CELU. For each JKO step, we perform 1000 stochastic gradient descent using Adam optimizer with a learning rate of 10 3, except for the mixture of Gaussians experiment, we use 2000 steps. We use 5000 particles. For score estimation, we use the same network architecture as in NODE. At each time step, we optimize the score network 100 steps. |