Demystifying SGD with Doubly Stochastic Gradients
Authors: Kyurae Kim, Joohwan Ko, Yian Ma, Jacob R. Gardner
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Simulation. We evaluate the insight on the tradeoff between π and π for correlated estimators on a synthetic problem. ... Results The results are shown in Fig. 1. |
| Researcher Affiliation | Academia | 1Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, PA, U.S.A. 2KAIST, Daejeon, South Korea, Republic of 3HalΔ±cΔ±o glu Data Science Institute, University of California San Diego, San Diego, CA, U.S.A.. |
| Pseudocode | Yes | 3.2.1. ALGORITHM. Doubly SGD-RR The algorithm is stated as follows: βΆ Reshuffle and partition the gradient estimators into minibatches of size π as π = {π1, , ππ}, where π = π/π is the number of partitions or minibatches. β· Perform gradient descent for π = 1, , π steps as ππ‘+1 π = Ξ π³ ( ππ‘ π β πΎπ‘πππ (ππ‘ π) ) βΈπ β π + 1 and go back to step βΆ. |
| Open Source Code | No | The paper mentions: "See the implementation at https://github.com/ zixu1986/Doubly_Stochastic_Gradients". However, this is given as an example of a related work's implementation to illustrate shared features across a batch, not as their own code release for the work described in the paper. |
| Open Datasets | No | The paper states: "In particular, we set ππ(π; π°) = πΏπ/2 π β π π+ π° 2... where the smoothness constants πΏπ Inv-Gamma(1/2, 1/2) and the stationary points π π π©(ππ, π 2ππ) are sampled randomly..." This describes a synthetic problem setup rather than the use of a public dataset with explicit access information. |
| Dataset Splits | No | The paper describes a synthetic problem and its setup but does not specify any training, validation, or test dataset splits in the typical machine learning sense, as it generates data for simulation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for its simulations or experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers that would be needed for reproducibility. |
| Experiment Setup | Yes | Setup We evaluate the insight on the tradeoff between π and π for correlated estimators on a synthetic problem. In particular, we set ππ(π; π°) = πΏπ/2 π β π π+ π° 2, where the smoothness constants πΏπ Inv-Gamma(1/2, 1/2) and the stationary points π π π©(ππ, π 2ππ) are sampled randomly, where ππ is a vector of π zeros and ππ is a π Γ π identity matrix. Then, we compute the gradient variance on the global optimum, corresponding to computing the BV (Definition 2) constant. Note that π 2 here corresponds to the heterogeneity of the data. We make the estimators dependent by sharing π°1, , π°π across the batch. |