Generalization Error Bounds for Optimization Algorithms via Stability

Authors: Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have conducted experiments on both convex and nonconvex problems, and the experimental results verify our theoretical findings.
Researcher Affiliation Collaboration 1 Peking University, qimeng13@pku.edu.cn; 2Beijing Jiaotong University, 11271012@bjtu.edu.cn 3Microsoft Research, {wche, taifengw, tie-yan.liu}@microsoft.com 4Chinese Academy of Mathematics and Systems Science, mazm@amt.ac.cn
Pseudocode No The paper provides mathematical update rules for GD, SGD, and SVRG in equations (4), (5), (6), and (7), but these are not presented in a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes For logistic regression, we conduct binary classification on benchmark dataset rcv1... As compared to the results for logistic regression and linear regression, we have the following observations on the results of neural networks. (1) The convergence rate is slower and the accuracy is lower. This is because of the nonconvexity and the gap between global optimum and local optimum. (2) SVRG is faster than GD and SGD but the differences between them are not as significant as in the convex cases, which is consistent with our discussions in Section 4 by considering the data size of CIFAR 10.
Dataset Splits No The paper mentions "training loss, test loss" and "log-scaled test loss" in Section 5, but it does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory, or cloud resources) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes For linear regression, we set the step size for GD, SGD, SVRG as 0.032, 0.01/t and 0.005, respectively... For logistic regression, we set the step sizes for GD, SGD, SVRG as 400, 200/t and 1, respectively. For neural networks... We tune the step size for GD, SGD, SVRG and eventually choose 0.03, 0.25/ t and 0.001, respectively... The inner loop size for SVRG for convex problems is set as 2n and that for nonconvex problem is set as 5n. [Also] plus an L2 regularization term with λ = 1/ n.