FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models
Authors: Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, David Duvenaud
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate FFJORD on a variety of density estimation tasks, and for approximate inference in variational autoencoders (Kingma & Welling, 2014). Experiments were conducted using a suite of GPU-based ODE-solvers and an implementation of the adjoint method for backpropagation1. |
| Researcher Affiliation | Collaboration | Will Grathwohl , Ricky T. Q. Chen , Jesse Bettencourt , Ilya Sutskever , David Duvenaud... University of Toronto and Vector Institute. Open AI. |
| Pseudocode | Yes | Algorithm 1 Unbiased stochastic log-density estimation using the FFJORD model |
| Open Source Code | Yes | Code can be found at https://github.com/rtqichen/ffjord and https://github.com/rtqichen/torchdiffeq. |
| Open Datasets | Yes | We perform density estimation on five tabular datasets preprocessed as in Papamakarios et al. (2017) and two image datasets; MNIST and CIFAR10. |
| Dataset Splits | No | The paper mentions using 'validation improvment' as an early stopping criterion, but does not provide specific split percentages or sample counts for training, validation, and test sets. For example: 'Our experimental procedure exactly mirrors that of Berg et al. (2018). We use the same 7-layer encoder and decoder, learning rate (.001), optimizer (Adam Kingma & Ba (2015)), batch size (100), and early stopping procedure (stop after 100 epochs of no validaiton improvment).' |
| Hardware Specification | No | The paper mentions 'GPU-based adaptive ODE-solvers' and 'Training took place on six GPUs' but does not specify exact GPU models (e.g., NVIDIA A100), CPU models, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions using 'torchdiffeq' and 'Adam optimizer', but does not provide specific version numbers for these or any other software components (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | In all experiments the Runge Kutta 4(5) algorithm with the tableau from Shampine (1986) was used to solve the ODEs. We ensure tolerance is set low enough so numerical error is negligible; see Appendix C. ... On the tabular datasets we performed a grid-search over network architectures. ... Both models were trained with the Adam optimizer (Kingma & Ba, 2015). We trained for 500 epochs with a learning rate of .001 which was decayed to .0001 after 250 epochs. ... On the tabular datasets we used a batch sizes up to 10,000 and on the image datasets we used a batch size of 900. ... Our experimental procedure exactly mirrors that of Berg et al. (2018). We use the same 7-layer encoder and decoder, learning rate (.001), optimizer (Adam Kingma & Ba (2015)), batch size (100), and early stopping procedure (stop after 100 epochs of no validaiton improvment). |