reproducibilityindex.ai

Ode to an ODE

Authors: Krzysztof M. Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a new paradigm for Neural ODE algorithms, called ODEto ODE, where time-dependent parameters of the main ﬂow evolve according to a matrix ﬂow on the orthogonal group O(d). This nested system of two ﬂows, where the parameter-ﬂow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem which is intrinsically related to training deep neural network architectures such as Neural ODEs. Consequently, it leads to better downstream models, as we show on the example of training reinforcement learning policies with evolution strategies, and in the supervised learning setting, by comparing with previous SOTA baselines. We provide strong convergence results for our proposed mechanism that are independent of the depth of the network, supporting our empirical studies. Our results show an intriguing connection between the theory of deep neural networks and the ﬁeld of matrix ﬂows on compact manifolds. [...] We run two sets of experiments comparing ODEto ODE with several other methods in the supervised setting and to train RL policies with ES.
Researcher Affiliation	Collaboration	Krzysztof Choromanski Robotics at Google NY Jared Quincy Davis Deep Mind & Stanford University Valerii Likhosherstov University of Cambridge Xingyou Song Google Brain Jean-Jacques Slotine Massachusetts Institute of Technology Jacob Varley Robotics at Google NY Honglak Lee Google Brain Adrian Weller University of Cambridge & The Alan Turing Institute Vikas Sindhwani Robotics at Google NY
Pseudocode	No	The paper includes mathematical formulas and schematic diagrams (Figure 1), but no structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link to its source code for the methodology described.
Open Datasets	Yes	We then use the above framework to outperform previous Neural ODE variants and baseline architectures on RL tasks from Open AI Gym and the Deep Mind Control Suite, and simultaneously to yield strong results on image classiﬁcation tasks. [...] All our supervised learning experiments use the Corrupted MNIST [41] (11 different variants) dataset.
Dataset Splits	No	The paper mentions using Corrupted MNIST and Open AI Gym/Deep Mind Control Suite environments, but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using 'scipy.linalg.expm function' but does not specify version numbers for any software or libraries used in the experiments.
Experiment Setup	Yes	In all Neural ODE methods we were integrating on the time interval [0, T] for T = 1 and applied discretization with integration step size η = 0.04 (in our ODEto ODE we used that η for both: the main ﬂow and the orthogonal ﬂow on O(d)). The dimensionality of the embedding of the input state s was chosen to be h = 64 for Open AI Gym Humanoid (for all methods but Hyper Net, where we chose h = 16, see: discussion below) and h = 16 for all other tasks. [...] We used standard deviation stddev = 0.1 of the Gaussian noise deﬁning ES perturbations, ES-gradient step size δ = 0.01 and function σ(x) = \|x\| as a nonlinear mapping. In all experiments we used k = 200 perturbations per iteration [15]. [...] For all models in Table 1, we did not use dropout, applied hidden width w = 128, and trained for 100 epochs. For all models in Table 2, we used dropout with r = 0.1 rate, hidden width w = 256, and trained for 100 epochs. For ODEto ODE variants, we used a discretization of η = 0.01.