Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics
Authors: Avik Pal, Yingbo Ma, Viral Shah, Christopher V Rackauckas
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this manuscript we show a generally applicable method to force the neural differential equation training process to choose the least expensive option. We demonstrate how our approach can halve the prediction time and, unlike other methods which can increase the training time by an order of magnitude, we demonstrate similar reduction in training times. |
| Researcher Affiliation | Collaboration | 1Indian Institute of Technology Kanpur 2Julia Computing 3Massachusetts Institute of Technology 4Pumas AI 5University of Maryland Baltimore. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our code1, implemented using the Julia Programming Language (Bezanson et al., 2017) and Sci ML Software Suite (Rackauckas et al., 2019), with the intention of wider adoption of the proposed methods in the community. 1https://github.com/avik-pal/Reg Neural ODE. jl |
| Open Datasets | Yes | We test our regularization on four tasks supervised image classification (Section 4.1.1) and time series interpolation (Section 4.1.2) using Neural ODE... For Section 4.1.1: "MNIST Images". For Section 4.1.2: "ICU Patients for Physionet Challenge 2012 Dataset (Silva et al., 2012)". |
| Dataset Splits | No | The paper mentions an "80 : 20 split of the data for training and evaluation" for the Physionet dataset, but does not explicitly define a separate validation split or set using terms like "validation set" or "validation split". |
| Hardware Specification | No | The paper mentions that "The wall clock timings represent runs on a CPU." but does not specify any particular CPU model, GPU, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions using "Julia Programming Language" and "Sci ML Software Suite", "Diff Eq Flux" and "Flux" for experiments. However, it does not provide specific version numbers for these software components, which are required for reproducibility. |
| Experiment Setup | Yes | We use a batch size of 512 and train the model for 75 epochs using Momentum (Qian, 1999) with learning rate of 0.1 and mass of 0.9, and a learning rate inverse decay of 10 5 per iteration. For Error Estimate Regularization, we perform exponential annealing of the regularization coefficient from 100.0 to 10.0 over 75 epochs. For Stiffness Regularization, we use a constant coefficient of 0.0285. |