First Order Stochastic Optimization with Oblivious Noise

Authors: Ilias Diakonikolas, Sushrut Karmalkar, Jong Ho Park, Christos Tzamos

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We initiate the study of stochastic optimization with oblivious noise, broadly generalizing the standard heavy-tailed noise setup. In our setting, in addition to random observation noise, the stochastic gradient may be subject to independent oblivious noise, which may not have bounded moments and is not necessarily centered. Our main result is an efficient list-decodable learner that recovers a small list of candidates, at least one of which is close to the true solution. Along the way, we develop a rejection-sampling-based algorithm to perform noisy location estimation, which may be of independent interest.
Researcher Affiliation Collaboration Ilias Diakonikolas Department of Computer Sciences University of Wisconsin-Madison ilias@cs.wisc.edu Sushrut Karmalkar Department of Computer Sciences University of Wisconsin-Madison skarmalkar@wisc.edu Jongho Park KRAFTON jongho.park@krafton.com Christos Tzamos Department of Informatics University of Athens tzamos@wisc.edu
Pseudocode Yes Algorithm 1 One-dimensional Location Estimation: Shift1D(S1, S2, η, σ, α) ... Algorithm 2 Noisy Gradient Optimization: Noisy Grad Desc(α, τ, δ, O, AG, AME) ... Algorithm 3 Inexact Gradient Oracle: Inexact Oracle(x; Oα,σ,f, τ, L0) ... Algorithm 4 One-dimensional Location Estimation: Shift1D(S1, S2, η, σ, α) ... Algorithm 5 High-dimensional Location Estimation: Shift High D(S1, S2, η, σ, α)
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets No This is a theoretical paper and does not involve experimental evaluation on datasets. Therefore, it does not mention training data or its public availability.
Dataset Splits No This is a theoretical paper and does not involve experimental evaluation on datasets. Therefore, it does not provide information about validation splits.
Hardware Specification No The paper is theoretical and does not describe experiments. Therefore, it does not specify any hardware used for computations.
Software Dependencies No The paper is theoretical and does not describe implementation details that would require specific software dependencies or versions.
Experiment Setup No The paper is theoretical and does not describe any experimental setup, hyperparameters, or training configurations.