Uniform Convergence of Gradients for Non-Convex Learning and Optimization

Authors: Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical The goal of the present work is to introduce learning-theoretic tools to in a general sense improve understanding of when and why gradient-based methods succeed for non-convex learning problems. Our precise technical contributions are as follows: We bring vector-valued Rademacher complexities [30] and associated vector-valued contraction principles to bear on the analysis of uniform convergence for gradients.
Researcher Affiliation Academia Dylan J. Foster Cornell University djfoster@cornell.edu Ayush Sekhari Cornell University sekhari@cs.cornell.edu Karthik Sridharan Cornell University sridharan@cs.cornell.edu
Pseudocode No The paper describes a 'meta-algorithm' and references other algorithms, but it does not include any formal pseudocode blocks or algorithm listings with labels such as 'Algorithm 1'.
Open Source Code No The paper does not contain any statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets No The paper is theoretical and focuses on mathematical analysis of learning problems. It does not conduct empirical experiments using specific datasets, and therefore does not provide access information for a publicly available dataset.
Dataset Splits No The paper is theoretical and does not describe empirical experiments or dataset splits for training, validation, or testing.
Hardware Specification No The paper is purely theoretical and does not describe any computational experiments, so there is no mention of specific hardware specifications used for running experiments.
Software Dependencies No The paper is purely theoretical and does not describe any computational experiments. It does not list specific software dependencies with version numbers that would be required for replication.
Experiment Setup No The paper is purely theoretical and does not describe any experimental setup details, hyperparameters, or training configurations.