Neural Networks with Cheap Differential Operators
Authors: Ricky T. Q. Chen, David K. Duvenaud
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate these cheap differential operators for solving root-finding subproblems in implicit ODE solvers, exact density evaluation for continuous normalizing flows, and evaluating the Fokker Planck equation for training stochastic differential equation models. We compare a standard Runge-Kutta (RK) solver with adaptive stepping (Shampine, 1986) and a predictor-corrector Adams-Bashforth-Moulton (ABM) method in Figure 2. A learned ordinary differential equation is used as part of a continuous normalizing flow (discussed in Section 4), and training requires solving this ordinary differential equation at every iteration. |
| Researcher Affiliation | Academia | Ricky T. Q. Chen, David Duvenaud University of Toronto, Vector Institute {rtqichen,duvenaud}@cs.toronto.edu |
| Pseudocode | No | The paper describes methods using text and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | Table 1 shows that training CNFs with exact trace using the Hollow Net architecture can lead to improvements on standard benchmark datasets, static MNIST and Omniglot. Figure 3 contains comparisons of models trained using maximum likelihood on the MINIBOONE dataset preprocessed by Papamakarios et al. (2017). |
| Dataset Splits | No | The paper mentions using “standard benchmark datasets” such as static MNIST and Omniglot, but it does not explicitly provide specific percentages, sample counts, or a detailed splitting methodology for training, validation, and testing within its text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run experiments, such as GPU/CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions deep learning frameworks like TensorFlow and PyTorch in the context of specific functionalities (stop_gradient, detach) with citations to their respective papers, but it does not specify version numbers for the software dependencies used in their implementation (e.g., Python, PyTorch/TensorFlow versions, CUDA). |
| Experiment Setup | Yes | We searched for dh {32, 64, 100} and used dh = 100 as the computational cost was not significantly impacted. We used 2-3 hidden layers for the conditioner and transformer networks, with the ELU activation function. We use 3-hidden layer deep neural networks to parameterize the SDE and density models with the Swish nonlinearity (Ramachandran et al., 2017), and use m = 5 Gaussian mixtures. λ is a non-negative weight that is annealed to zero by the end of training. |