Never Go Full Batch (in Stochastic Convex Optimization)
Authors: Idan Amir, Yair Carmon, Tomer Koren, Roi Livni
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within ε after O(1/ε2) iterations, full-batch methods either need at least Ω(1/ε4) iterations or exhibit a dimension-dependent sample complexity. Our arguments and proofs here are more challenging, as we need to reason about a general family of algorithms, and not about a specific algorithm whose trajectory can be analyzed directly. |
| Researcher Affiliation | Collaboration | Idan Amir Department of EE Tel Aviv University idanamir@mail.tau.ac.il Yair Carmon Blavatnik School of CS Tel Aviv University ycarmon@tauex.tau.ac.il Tomer Koren Blavatnik School of CS, Tel Aviv U and Google Research tkoren@tauex.tau.ac.il Roi Livni Department of EE Tel Aviv University rlivni@tauex.tau.ac.il |
| Pseudocode | No | The paper contains mathematical definitions and formulations, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about making the source code for the described methodology publicly available, nor does it provide a link to a code repository. |
| Open Datasets | No | This is a theoretical paper that does not conduct experiments with datasets, therefore there is no information on publicly available training datasets. |
| Dataset Splits | No | This is a theoretical paper that does not conduct experiments with datasets, therefore there is no information on dataset splits. |
| Hardware Specification | No | This is a theoretical paper that does not conduct experiments, therefore no specific hardware specifications are mentioned. |
| Software Dependencies | No | This is a theoretical paper that does not conduct experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | This is a theoretical paper that focuses on mathematical proofs and analysis, and thus does not include details about experimental setup, hyperparameters, or system-level training settings. |