Characterization of Excess Risk for Locally Strongly Convex Population Risk

Authors: Mingyang Yi, Ruoyu Wang, Zhi-Ming Ma

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish upper bounds for the expected excess risk of models trained by proper iterative algorithms which approximate the local minima. Unlike the results built upon the strong globally strongly convexity or global growth conditions e.g., PL-inequality, we only require the population risk to be locally strongly convex around its local minima. Concretely, our bound under convex problems is of order O(1/n). For non-convex problems with d model parameters such that d/n is smaller than a threshold independent of n, the order of O(1/n) can be maintained if the empirical risk has no spurious local minima with high probability. Moreover, the bound for non-convex problem becomes O(1/ n) without such assumption. Our results are derived via algorithmic stability and characterization of the empirical risk s landscape.
Researcher Affiliation Collaboration Mingyang Yi1,2,3 , Ruoyu Wang1,2 , Zhi-Ming Ma1,2 1University of Chinese Academy of Sciences 2Academy of Mathematics and Systems Science, Chinese Academy of Sciences 3Huawei Noah s Ark Lab
Pseudocode No The paper provides mathematical update rules for algorithms like GD and SGD (equations 8 and 9) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper discusses theoretical constructs and examples of problem types that satisfy its assumptions (e.g., PCA, ICA, matrix completion) but does not conduct empirical studies using specific, named public datasets.
Dataset Splits No The paper is theoretical and does not describe any empirical experiments, therefore it does not provide details on training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any computational experiments, thus it provides no details on specific hardware used.
Software Dependencies No The paper does not specify any software dependencies with version numbers used for implementation or experimentation.
Experiment Setup No The paper is theoretical and does not detail specific experimental setups, hyperparameters, or training configurations for empirical evaluations.