Optimizing Neural Networks via Koopman Operator Theory

Authors: Akshunna S. Dogra, William Redman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that Koopman operator theoretic methods allow for accurate predictions of weights and biases of feedforward, fully connected deep networks over a non-trivial range of training time. During this window, we find that our approach is >10x faster than various gradient descent based methods (e.g. Adam, Adadelta, Adagrad), in line with our complexity analysis. We then present the results of Koopman training two different feedforward, fully connected, deep NNs: an NN differential equation (DE) solver and a classifier trained on the MNIST data set.
Researcher Affiliation Academia Akshunna S. Dogra John A. Paulson School of Engineering and Applied Sciences Harvard University Cambridge, MA 02138, asdpsn@gmail.com William T. Redman Interdeparmental Graduate Program in Dynamical Neuroscience University of California, Santa Barbara Santa Barbara, CA 93106 wredman@ucsb.edu
Pseudocode Yes Pseudo-code for how weight/bias data from standard training iteration t1 to t2 is used to approximate the Node Koopman operators, and how these operators are then used to predict the weight evolution from t2 + 1 to t2 + T, is given in Algorithm 1. Algorithm 1 Koopman training via Node Koopman operators
Open Source Code No The paper does not provide explicit links or statements about the availability of its source code.
Open Datasets Yes We then present the results of Koopman training two different feedforward, fully connected, deep NNs: an NN differential equation (DE) solver and a classifier trained on the MNIST data set. This NN was trained on the MNIST data set via stochastic Adadelta (Fig. 3a see Sec. S4 for more details).
Dataset Splits No The NN made a distinction between training and validation loss. However, the paper does not specify exact split percentages or sample counts for the validation dataset.
Hardware Specification No The paper mentions 'leveraging the immense parallelization capacities of modern GPUs' but does not specify any particular GPU models, CPU models, or other specific hardware configurations used for experiments.
Software Dependencies No The paper mentions using 'the Py Torch environment [37]: Adam [38], Adagrad [39], and Adadelta [40]' but does not provide specific version numbers for PyTorch or the optimizers. Reference [37] cites a NIPS 2019 paper for PyTorch, which implies a version from that year, but a numerical version (e.g., PyTorch 1.9) is not given.
Experiment Setup Yes For ease of implementation, we modified it to have sigmoidal activation functions, a fixed batch (making the optimization non-stochastic), a learning rate with a weak training step dependent decay, and trained with various optimizers via backpropagation (see Sec. S3 for more details). This NN was trained on the MNIST data set via stochastic Adadelta (Fig. 3a see Sec. S4 for more details).