Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD

Authors: Aniket Das, Dheeraj Nagaraj, Soumyabrata Pal, Arun Suggala, Prateek Varshney

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex optimization with heavy tailed stochastic gradients, and prove that the widely used Clipped SGD algorithm attains near-optimal sub-Gaussian statistical rates whenever the second moment of the stochastic gradient noise is finite. Key to our result is a novel iterative refinement strategy for martingale concentration, improving upon the PAC-Bayes approach of Catoni and Giulini [8].
Researcher Affiliation Collaboration Aniket Das Stanford University aniketd@cs.stanford.edu Dheeraj Nagaraj Google Deep Mind dheerajnagaraj@google.com Soumyabrata Pal Adobe Research soumyabratap@adobe.com Arun Sai Suggala Google Deep Mind arunss@google.com Prateek Varshney Stanford University vprateek@stanford.edu
Pseudocode Yes Algorithm 1 Clipped Stochastic Gradient Descent
Open Source Code No The paper does not provide a statement about releasing open-source code or a link to a code repository for the described methodology.
Open Datasets No This is a theoretical paper that focuses on mathematical derivations and analyses of algorithms under theoretical assumptions (e.g., heavy-tailed distributions), rather than empirical evaluation on specific datasets. Therefore, no dataset is mentioned or provided.
Dataset Splits No This is a theoretical paper and does not conduct experiments with dataset splits for training, validation, or testing.
Hardware Specification No This is a theoretical paper and does not describe any specific hardware used for experiments.
Software Dependencies No This is a theoretical paper and does not specify software dependencies with version numbers for experimental reproducibility.
Experiment Setup No This is a theoretical paper and does not describe specific experimental setup details such as hyperparameters for empirical runs. It describes theoretical algorithm parameters (e.g., step sizes, clipping levels) as part of the algorithm design, not as an 'experimental setup' for empirical evaluation.