reproducibilityindex.ai

Demystifying SGD with Doubly Stochastic Gradients

Authors: Kyurae Kim, Joohwan Ko, Yian Ma, Jacob R. Gardner

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Simulation. We evaluate the insight on the tradeoff between 𝑏 and 𝑚 for correlated estimators on a synthetic problem. ... Results The results are shown in Fig. 1.
Researcher Affiliation	Academia	1Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, PA, U.S.A. 2KAIST, Daejeon, South Korea, Republic of 3Halıcıo glu Data Science Institute, University of California San Diego, San Diego, CA, U.S.A..
Pseudocode	Yes	3.2.1. ALGORITHM. Doubly SGD-RR The algorithm is stated as follows: ❶ Reshuffle and partition the gradient estimators into minibatches of size 𝑏 as 𝘗 = {𝘗1, , 𝘗𝑝}, where 𝑝 = 𝑛/𝑏 is the number of partitions or minibatches. ❷ Perform gradient descent for 𝑖 = 1, , 𝑝 steps as 𝒙𝑡+1 𝑘 = Π𝒳 ( 𝒙𝑡 𝑘 − 𝛾𝑡𝑔𝘗𝑖 (𝒙𝑡 𝑘) ) ❸𝑘 ← 𝑘 + 1 and go back to step ❶.
Open Source Code	No	The paper mentions: "See the implementation at https://github.com/ zixu1986/Doubly_Stochastic_Gradients". However, this is given as an example of a related work's implementation to illustrate shared features across a batch, not as their own code release for the work described in the paper.
Open Datasets	No	The paper states: "In particular, we set 𝑓𝑖(𝒙; 𝞰) = 𝐿𝑖/2 𝒙 − 𝒙 𝑖+ 𝞰 2... where the smoothness constants 𝐿𝑖 Inv-Gamma(1/2, 1/2) and the stationary points 𝒙 𝑖 𝒩(𝟎𝑑, 𝑠2𝐈𝑑) are sampled randomly..." This describes a synthetic problem setup rather than the use of a public dataset with explicit access information.
Dataset Splits	No	The paper describes a synthetic problem and its setup but does not specify any training, validation, or test dataset splits in the typical machine learning sense, as it generates data for simulation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for its simulations or experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers that would be needed for reproducibility.
Experiment Setup	Yes	Setup We evaluate the insight on the tradeoff between 𝑏 and 𝑚 for correlated estimators on a synthetic problem. In particular, we set 𝑓𝑖(𝒙; 𝞰) = 𝐿𝑖/2 𝒙 − 𝒙 𝑖+ 𝞰 2, where the smoothness constants 𝐿𝑖 Inv-Gamma(1/2, 1/2) and the stationary points 𝒙 𝑖 𝒩(𝟎𝑑, 𝑠2𝐈𝑑) are sampled randomly, where 𝟎𝑑 is a vector of 𝑑 zeros and 𝐈𝑑 is a 𝑑 × 𝑑 identity matrix. Then, we compute the gradient variance on the global optimum, corresponding to computing the BV (Definition 2) constant. Note that 𝑠2 here corresponds to the heterogeneity of the data. We make the estimators dependent by sharing 𝞰1, , 𝞰𝑚 across the batch.