Neural Networks with Cheap Differential Operators

Authors: Ricky T. Q. Chen, David K. Duvenaud

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate these cheap differential operators for solving root-finding subproblems in implicit ODE solvers, exact density evaluation for continuous normalizing flows, and evaluating the Fokker Planck equation for training stochastic differential equation models. We compare a standard Runge-Kutta (RK) solver with adaptive stepping (Shampine, 1986) and a predictor-corrector Adams-Bashforth-Moulton (ABM) method in Figure 2. A learned ordinary differential equation is used as part of a continuous normalizing flow (discussed in Section 4), and training requires solving this ordinary differential equation at every iteration.
Researcher Affiliation Academia Ricky T. Q. Chen, David Duvenaud University of Toronto, Vector Institute {rtqichen,duvenaud}@cs.toronto.edu
Pseudocode No The paper describes methods using text and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes Table 1 shows that training CNFs with exact trace using the Hollow Net architecture can lead to improvements on standard benchmark datasets, static MNIST and Omniglot. Figure 3 contains comparisons of models trained using maximum likelihood on the MINIBOONE dataset preprocessed by Papamakarios et al. (2017).
Dataset Splits No The paper mentions using “standard benchmark datasets” such as static MNIST and Omniglot, but it does not explicitly provide specific percentages, sample counts, or a detailed splitting methodology for training, validation, and testing within its text.
Hardware Specification No The paper does not provide specific details about the hardware used to run experiments, such as GPU/CPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions deep learning frameworks like TensorFlow and PyTorch in the context of specific functionalities (stop_gradient, detach) with citations to their respective papers, but it does not specify version numbers for the software dependencies used in their implementation (e.g., Python, PyTorch/TensorFlow versions, CUDA).
Experiment Setup Yes We searched for dh {32, 64, 100} and used dh = 100 as the computational cost was not significantly impacted. We used 2-3 hidden layers for the conditioner and transformer networks, with the ELU activation function. We use 3-hidden layer deep neural networks to parameterize the SDE and density models with the Swish nonlinearity (Ramachandran et al., 2017), and use m = 5 Gaussian mixtures. λ is a non-negative weight that is annealed to zero by the end of training.