Efficiency Ordering of Stochastic Gradient Descent
Authors: Jie Hu, Vishwaraj Doshi, Do-Young Eun
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically validate our theoretical analysis. We select two convex objective functions as follows. f(θ)= 1n P i=1 log(1+exp( yix T i θ))+ 12 θ 2 2 , ˆf(θ)= 1n P i=1 θT (aia T i +Di)θ+b T θ. (14) For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63]. We set the step size in the SGD algorithm as 1/t0.9, and use MSE E[ θt θ 2 2] to measure the relative performance of different inputs. We also employ the scaled MSE E[ θt θ 2 2]/γt to empirically show its relationship to the CLT result (9). |
| Researcher Affiliation | Collaboration | Jie Hu1 Vishwaraj Doshi2 Do Young Eun1 1 Department of Electrical and Computer Engineering, North Carolina State University 2 Data Science and Advanced Analytics, IQVIA |
| Pseudocode | No | The paper does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63]. |
| Dataset Splits | No | The paper does not explicitly describe training/validation/test splits beyond mentioning dataset usage. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63]. We set the step size in the SGD algorithm as 1/t0.9, and use MSE E[ θt θ 2 2] to measure the relative performance of different inputs. We also employ the scaled MSE E[ θt θ 2 2]/γt to empirically show its relationship to the CLT result (9). |