Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

Authors: Dheeraj Nagaraj, Xian Wu, Guy Bresler, Prateek Jain, Praneeth Netrapalli

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations confirm our analysis and indicates that our derived rates are tight. We also conducted experiments on data generated using Gaussian AR MC (6). We set (d, σ, ϵ) = (10, 1e 3, 0.01) and choose the buffer size B = 1/ϵ2. We report results averaged over 100 runs. Figure 1a compare the estimation error achieved by SGD, SGD-DD, and the proposed SGDER method. Note that, as expected by our theorems, the decay regime starts at d τmix for SGD-ER and dτmix for SGD which is similar to rate of SGD-DD.
Researcher Affiliation Collaboration Guy Bresler Massachusetts Institute of Technology Cambridge, USA 02139 guy@mit.edu Prateek Jain Microsoft Research Bengaluru, India 560001 prajain@microsoft.com Dheeraj Nagaraj Massachusetts Institute of Technology Cambridge, USA 02139 dheeraj@mit.edu Praneeth Netrapalli Microsoft Research Bengaluru, India 560001 praneeth@microsoft.com Xian Wu Stanford University Stanford, USA 94305 xwu20@stanford.edu
Pseudocode Yes Algorithm 1 SGD-DD; Algorithm 2 Parallel SGD; Algorithm 3 SGD with tail-averaging; Algorithm 4 SGD with Experience Replay (SGD-ER)
Open Source Code No The paper does not provide explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets No We also conducted experiments on data generated using Gaussian AR MC (6). The paper uses simulated data, but does not provide any access information for it.
Dataset Splits No The paper describes conducting experiments with generated data but does not specify any formal train/validation/test dataset splits or their percentages/counts for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names with versions, programming language versions, or solver versions) needed to replicate the experiments.
Experiment Setup Yes We set (d, σ, ϵ) = (10, 1e 3, 0.01) and choose the buffer size B = 1/ϵ2. We report results averaged over 100 runs. Require: (X1, Y1), . . . (XT , YT ) Rd sampled using (6), η: learning rate w N(0, 1), B 1 ϵ7 , u max( 2 ϵ2 log 300000πd B ϵ2 log 300000πd2σ6 ϵ2δ ), S B + u