Never Go Full Batch (in Stochastic Convex Optimization)

Authors: Idan Amir, Yair Carmon, Tomer Koren, Roi Livni

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within ε after O(1/ε2) iterations, full-batch methods either need at least Ω(1/ε4) iterations or exhibit a dimension-dependent sample complexity. Our arguments and proofs here are more challenging, as we need to reason about a general family of algorithms, and not about a specific algorithm whose trajectory can be analyzed directly.
Researcher Affiliation Collaboration Idan Amir Department of EE Tel Aviv University idanamir@mail.tau.ac.il Yair Carmon Blavatnik School of CS Tel Aviv University ycarmon@tauex.tau.ac.il Tomer Koren Blavatnik School of CS, Tel Aviv U and Google Research tkoren@tauex.tau.ac.il Roi Livni Department of EE Tel Aviv University rlivni@tauex.tau.ac.il
Pseudocode No The paper contains mathematical definitions and formulations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository.
Open Datasets No This is a theoretical paper that does not conduct experiments with datasets, therefore there is no information on publicly available training datasets.
Dataset Splits No This is a theoretical paper that does not conduct experiments with datasets, therefore there is no information on dataset splits.
Hardware Specification No This is a theoretical paper that does not conduct experiments, therefore no specific hardware specifications are mentioned.
Software Dependencies No This is a theoretical paper that does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned.
Experiment Setup No This is a theoretical paper that focuses on mathematical proofs and analysis, and thus does not include details about experimental setup, hyperparameters, or system-level training settings.