Characterization of Excess Risk for Locally Strongly Convex Population Risk
Authors: Mingyang Yi, Ruoyu Wang, Zhi-Ming Ma
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish upper bounds for the expected excess risk of models trained by proper iterative algorithms which approximate the local minima. Unlike the results built upon the strong globally strongly convexity or global growth conditions e.g., PL-inequality, we only require the population risk to be locally strongly convex around its local minima. Concretely, our bound under convex problems is of order O(1/n). For non-convex problems with d model parameters such that d/n is smaller than a threshold independent of n, the order of O(1/n) can be maintained if the empirical risk has no spurious local minima with high probability. Moreover, the bound for non-convex problem becomes O(1/ n) without such assumption. Our results are derived via algorithmic stability and characterization of the empirical risk s landscape. |
| Researcher Affiliation | Collaboration | Mingyang Yi1,2,3 , Ruoyu Wang1,2 , Zhi-Ming Ma1,2 1University of Chinese Academy of Sciences 2Academy of Mathematics and Systems Science, Chinese Academy of Sciences 3Huawei Noah s Ark Lab |
| Pseudocode | No | The paper provides mathematical update rules for algorithms like GD and SGD (equations 8 and 9) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper discusses theoretical constructs and examples of problem types that satisfy its assumptions (e.g., PCA, ICA, matrix completion) but does not conduct empirical studies using specific, named public datasets. |
| Dataset Splits | No | The paper is theoretical and does not describe any empirical experiments, therefore it does not provide details on training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments, thus it provides no details on specific hardware used. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers used for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and does not detail specific experimental setups, hyperparameters, or training configurations for empirical evaluations. |