Second-Order Neural ODE Optimizer

Authors: Guan-Horng Liu, Tianrong Chen, Evangelos Theodorou

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our resulting method named SNOpt converges much faster than first-order baselines in wall-clock time, and the improvement remains consistent across various applications, e.g. image classification, generative flow, and timeseries prediction. Table 1: Numerical errors between ground-truth and adjoint derivatives using different ODESolve on CIFAR10. Section 4 Experiments
Researcher Affiliation Academia Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou Georgia Institute of Technology, USA {ghliu, tianrong.chen, evangelos.theodorou}@gatech.edu
Pseudocode Yes Algorithm 1 SNOpt: Second-order Neural ODE Optimizer
Open Source Code Yes Our code is available at https://github.com/ghliu/snopt.
Open Datasets Yes Dataset. We select 9 datasets from 3 distinct applications where N-ODEs have been applied, including image classification ( ), time-series prediction ( ), and continuous normalizing flow ( ; CNF): MNIST, SVHN, CIFAR10: MNIST consists of 28 28 gray-scale images, while SVHN and CIFAR10 consist of 3 32 32 colour images. All 3 image datasets have 10 label classes. Spo AD, Art WR, Char T: We consider UEA time series archive (Bagnall et al., 2018). Spoken Arabic Digits (Spo AD) is a speech dataset, whereas Articulary Word Recognition (Art WR) and Character Trajectories (Char T) are motion-related datasets. Table 3 details their sample sizes. Circle, Gas, Miniboone: Circle is a 2-dim synthetic dataset adopted from Chen et al. (2018). Gas and Miniboone are 8 and 43-dim tabular datasets commonly used in CNF (Grathwohl et al., 2018; Onken et al., 2020). All 3 datasets transform a multivariate Gaussian to the target distributions.
Dataset Splits No The paper mentions training on various datasets and tuning hyperparameters, but it does not explicitly provide details about training/validation/test dataset splits, such as percentages or sample counts for each split.
Hardware Specification Yes All experiments are conducted on a TITAN RTX.
Software Dependencies No We use standard Runge-Kutta 4(5) adaptive solver (dopri5; Dormand & Prince (1980)) implemented by the torchdiffeq package. The numerical tolerance is set to 1e-6 for CNF and 1e-3 for the rest.
Experiment Setup Yes The numerical tolerance is set to 1e-6 for CNF and 1e-3 for the rest. The batch size is set to 256, 512, and 1000 respectively for Art Word, Char Traj, and Gas. The rest of the datasets use 128 as the batch size.