Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems
Authors: Suhas Kowshik, Dheeraj Nagaraj, Prateek Jain, Praneeth Netrapalli
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our results via simulations and demonstrate that a naive application of SGD can be highly sub-optimal. Indeed, our work demonstrates that for correlated data, specialized methods designed for the dependency structure in data can significantly outperform standard SGD based methods. ... Figure 2: Performance of various algorithms for the case of φ = Leaky Re LU |
| Researcher Affiliation | Collaboration | Prateek Jain Google AI Research Lab, Bengaluru, India 560016 prajain@google.com Suhas S Kowshik Department of EECS MIT, Cambridge, MA 02139 suhask@mit.edu Dheeraj Nagaraj Department of EECS MIT, Cambridge, MA 02139 dheeraj@mit.edu Praneeth Netrapalli Google AI Research Lab, Bengaluru, India 560016 pnetrapalli@google.com |
| Pseudocode | Yes | Algorithm 1: Quasi Newton Method ... Algorithm 2: SGD RER |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | No | Synthetic data: We sample data from NLDS(A , µ, φ) where µ N(0, σ2I) and A Rd d is generated from the "Rand Bi Mod" distribution. |
| Dataset Splits | No | The paper mentions generating synthetic data for experiments but does not explicitly describe train, validation, or test splits. It refers to a total 'horizon T = 10^5' but not how it was partitioned. |
| Hardware Specification | No | Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] The experiments run on a standard computer within 1 min. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | Algorithm Parameters We set B = 240 and u = 10 for the buffer size and gap size respectively for both SGD RER and SGD ER and use full averaging (i.e, θ = 0 in Algorithm 2 ). We set the step size γ = 5 log T / T for SGD, SGD RER, and SGD ER and γnewton = 0.2 and γGLMtron = 0.017. |