Efficiency Ordering of Stochastic Gradient Descent

Authors: Jie Hu, Vishwaraj Doshi, Do-Young Eun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically validate our theoretical analysis. We select two convex objective functions as follows. f(θ)= 1n P i=1 log(1+exp( yix T i θ))+ 12 θ 2 2 , ˆf(θ)= 1n P i=1 θT (aia T i +Di)θ+b T θ. (14) For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63]. We set the step size in the SGD algorithm as 1/t0.9, and use MSE E[ θt θ 2 2] to measure the relative performance of different inputs. We also employ the scaled MSE E[ θt θ 2 2]/γt to empirically show its relationship to the CLT result (9).
Researcher Affiliation Collaboration Jie Hu1 Vishwaraj Doshi2 Do Young Eun1 1 Department of Electrical and Computer Engineering, North Carolina State University 2 Data Science and Advanced Analytics, IQVIA
Pseudocode No The paper does not contain explicit pseudocode or algorithm blocks.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets Yes For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63].
Dataset Splits No The paper does not explicitly describe training/validation/test splits beyond mentioning dataset usage.
Hardware Specification No The paper does not specify the hardware used for experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers.
Experiment Setup Yes For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63]. We set the step size in the SGD algorithm as 1/t0.9, and use MSE E[ θt θ 2 2] to measure the relative performance of different inputs. We also employ the scaled MSE E[ θt θ 2 2]/γt to empirically show its relationship to the CLT result (9).