reproducibilityindex.ai

Efficiency Ordering of Stochastic Gradient Descent

Authors: Jie Hu, Vishwaraj Doshi, Do-Young Eun

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we empirically validate our theoretical analysis. We select two convex objective functions as follows. f(θ)= 1n P i=1 log(1+exp( yix T i θ))+ 12 θ 2 2 , ˆf(θ)= 1n P i=1 θT (aia T i +Di)θ+b T θ. (14) For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63]. We set the step size in the SGD algorithm as 1/t0.9, and use MSE E[ θt θ 2 2] to measure the relative performance of different inputs. We also employ the scaled MSE E[ θt θ 2 2]/γt to empirically show its relationship to the CLT result (9).
Researcher Affiliation	Collaboration	Jie Hu1 Vishwaraj Doshi2 Do Young Eun1 1 Department of Electrical and Computer Engineering, North Carolina State University 2 Data Science and Advanced Analytics, IQVIA
Pseudocode	No	The paper does not contain explicit pseudocode or algorithm blocks.
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets	Yes	For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63].
Dataset Splits	No	The paper does not explicitly describe training/validation/test splits beyond mentioning dataset usage.
Hardware Specification	No	The paper does not specify the hardware used for experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers.
Experiment Setup	Yes	For l2-regularized logistic regression f(θ), we choose the dataset CIFAR-10 [40] where n is the total number of data points. Here, xi R108 is the vector flattened from the cropped image i with shape (6, 6, 3), and yi R is the label. For sum-of-non-convex functions ˆf(θ), which is based on the experiment setup in [31, 4], we generate random vectors ai, b and matrices Di which ensure the invertibility of matrix Pn i=1 aia T i and Pn i=1 Di = 0 (details are deferred to Appendix I.2 [1]). For both experiments, we assign a data point to each node i on the general graph Dolphins (62 nodes) [63]. We set the step size in the SGD algorithm as 1/t0.9, and use MSE E[ θt θ 2 2] to measure the relative performance of different inputs. We also employ the scaled MSE E[ θt θ 2 2]/γt to empirically show its relationship to the CLT result (9).