Demystifying SGD with Doubly Stochastic Gradients

Authors: Kyurae Kim, Joohwan Ko, Yian Ma, Jacob R. Gardner

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Simulation. We evaluate the insight on the tradeoff between 𝑏 and π‘š for correlated estimators on a synthetic problem. ... Results The results are shown in Fig. 1.
Researcher Affiliation Academia 1Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, PA, U.S.A. 2KAIST, Daejeon, South Korea, Republic of 3HalΔ±cΔ±o glu Data Science Institute, University of California San Diego, San Diego, CA, U.S.A..
Pseudocode Yes 3.2.1. ALGORITHM. Doubly SGD-RR The algorithm is stated as follows: ❢ Reshuffle and partition the gradient estimators into minibatches of size 𝑏 as π˜— = {π˜—1, , π˜—π‘}, where 𝑝 = 𝑛/𝑏 is the number of partitions or minibatches. ❷ Perform gradient descent for 𝑖 = 1, , 𝑝 steps as 𝒙𝑑+1 π‘˜ = Π𝒳 ( 𝒙𝑑 π‘˜ βˆ’ π›Ύπ‘‘π‘”π˜—π‘– (𝒙𝑑 π‘˜) ) βΈπ‘˜ ← π‘˜ + 1 and go back to step ❢.
Open Source Code No The paper mentions: "See the implementation at https://github.com/ zixu1986/Doubly_Stochastic_Gradients". However, this is given as an example of a related work's implementation to illustrate shared features across a batch, not as their own code release for the work described in the paper.
Open Datasets No The paper states: "In particular, we set 𝑓𝑖(𝒙; 𝞰) = 𝐿𝑖/2 𝒙 βˆ’ 𝒙 𝑖+ 𝞰 2... where the smoothness constants 𝐿𝑖 Inv-Gamma(1/2, 1/2) and the stationary points 𝒙 𝑖 𝒩(πŸŽπ‘‘, 𝑠2πˆπ‘‘) are sampled randomly..." This describes a synthetic problem setup rather than the use of a public dataset with explicit access information.
Dataset Splits No The paper describes a synthetic problem and its setup but does not specify any training, validation, or test dataset splits in the typical machine learning sense, as it generates data for simulation.
Hardware Specification No The paper does not provide any specific details about the hardware used for its simulations or experiments.
Software Dependencies No The paper does not list any specific software dependencies with version numbers that would be needed for reproducibility.
Experiment Setup Yes Setup We evaluate the insight on the tradeoff between 𝑏 and π‘š for correlated estimators on a synthetic problem. In particular, we set 𝑓𝑖(𝒙; 𝞰) = 𝐿𝑖/2 𝒙 βˆ’ 𝒙 𝑖+ 𝞰 2, where the smoothness constants 𝐿𝑖 Inv-Gamma(1/2, 1/2) and the stationary points 𝒙 𝑖 𝒩(πŸŽπ‘‘, 𝑠2πˆπ‘‘) are sampled randomly, where πŸŽπ‘‘ is a vector of 𝑑 zeros and πˆπ‘‘ is a 𝑑 Γ— 𝑑 identity matrix. Then, we compute the gradient variance on the global optimum, corresponding to computing the BV (Definition 2) constant. Note that 𝑠2 here corresponds to the heterogeneity of the data. We make the estimators dependent by sharing 𝞰1, , πž°π‘š across the batch.