Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems

Authors: Suhas Kowshik, Dheeraj Nagaraj, Prateek Jain, Praneeth Netrapalli

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our results via simulations and demonstrate that a naive application of SGD can be highly sub-optimal. Indeed, our work demonstrates that for correlated data, specialized methods designed for the dependency structure in data can significantly outperform standard SGD based methods. ... Figure 2: Performance of various algorithms for the case of φ = Leaky Re LU
Researcher Affiliation Collaboration Prateek Jain Google AI Research Lab, Bengaluru, India 560016 prajain@google.com Suhas S Kowshik Department of EECS MIT, Cambridge, MA 02139 suhask@mit.edu Dheeraj Nagaraj Department of EECS MIT, Cambridge, MA 02139 dheeraj@mit.edu Praneeth Netrapalli Google AI Research Lab, Bengaluru, India 560016 pnetrapalli@google.com
Pseudocode Yes Algorithm 1: Quasi Newton Method ... Algorithm 2: SGD RER
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets No Synthetic data: We sample data from NLDS(A , µ, φ) where µ N(0, σ2I) and A Rd d is generated from the "Rand Bi Mod" distribution.
Dataset Splits No The paper mentions generating synthetic data for experiments but does not explicitly describe train, validation, or test splits. It refers to a total 'horizon T = 10^5' but not how it was partitioned.
Hardware Specification No Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] The experiments run on a standard computer within 1 min.
Software Dependencies No The paper does not specify software dependencies with version numbers.
Experiment Setup Yes Algorithm Parameters We set B = 240 and u = 10 for the buffer size and gap size respectively for both SGD RER and SGD ER and use full averaging (i.e, θ = 0 in Algorithm 2 ). We set the step size γ = 5 log T / T for SGD, SGD RER, and SGD ER and γnewton = 0.2 and γGLMtron = 0.017.