Second-Order Neural ODE Optimizer
Authors: Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our resulting method named SNOpt converges much faster than first-order baselines in wall-clock time, and the improvement remains consistent across various applications, e.g. image classification, generative flow, and timeseries prediction. Table 1: Numerical errors between ground-truth and adjoint derivatives using different ODESolve on CIFAR10. Section 4 Experiments |
| Researcher Affiliation | Academia | Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou Georgia Institute of Technology, USA {ghliu, tianrong.chen, evangelos.theodorou}@gatech.edu |
| Pseudocode | Yes | Algorithm 1 SNOpt: Second-order Neural ODE Optimizer |
| Open Source Code | Yes | Our code is available at https://github.com/ghliu/snopt. |
| Open Datasets | Yes | Dataset. We select 9 datasets from 3 distinct applications where N-ODEs have been applied, including image classification ( ), time-series prediction ( ), and continuous normalizing flow ( ; CNF): MNIST, SVHN, CIFAR10: MNIST consists of 28 28 gray-scale images, while SVHN and CIFAR10 consist of 3 32 32 colour images. All 3 image datasets have 10 label classes. Spo AD, Art WR, Char T: We consider UEA time series archive (Bagnall et al., 2018). Spoken Arabic Digits (Spo AD) is a speech dataset, whereas Articulary Word Recognition (Art WR) and Character Trajectories (Char T) are motion-related datasets. Table 3 details their sample sizes. Circle, Gas, Miniboone: Circle is a 2-dim synthetic dataset adopted from Chen et al. (2018). Gas and Miniboone are 8 and 43-dim tabular datasets commonly used in CNF (Grathwohl et al., 2018; Onken et al., 2020). All 3 datasets transform a multivariate Gaussian to the target distributions. |
| Dataset Splits | No | The paper mentions training on various datasets and tuning hyperparameters, but it does not explicitly provide details about training/validation/test dataset splits, such as percentages or sample counts for each split. |
| Hardware Specification | Yes | All experiments are conducted on a TITAN RTX. |
| Software Dependencies | No | We use standard Runge-Kutta 4(5) adaptive solver (dopri5; Dormand & Prince (1980)) implemented by the torchdiffeq package. The numerical tolerance is set to 1e-6 for CNF and 1e-3 for the rest. |
| Experiment Setup | Yes | The numerical tolerance is set to 1e-6 for CNF and 1e-3 for the rest. The batch size is set to 256, 512, and 1000 respectively for Art Word, Char Traj, and Gas. The rest of the datasets use 128 as the batch size. |