Semi-Cyclic Stochastic Gradient Descent
Authors: Hubert Eichner, Tomer Koren, Brendan Mcmahan, Nathan Srebro, Kunal Talwar
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To illustrate the challenges of optimizing on block-cyclic data, we train and evaluate a small logistic regression model on the Sentiment140 Twitter dataset (Go et al., 2009), a binary classification task over 1.6 106 examples. We split the data into training (90%) and test (10%) sets, partition it into m = 6 components based on the timestamps (but not dates) of the Tweets: 12am 4am, 4am 8am, etc, then divide each component across K = 10 cycles (days). For more details, see Appendix C. For simplicity, we keep the model architecture (linear bag of words classifier over top 1024 words) and minibatch size (128) fixed; we used a learning rate of η = 0.464 (determined through log-space grid search) except for the per-component SGD approach (8) where η = 1.0 was optimal due to fewer iterations. Figure 1 illustrates how the block-cyclic structure of data can hurt accuracy of a consensus model. Figure 2 compares results from the four approaches. When the component distributions are different, pluralism can outperform the ideal i.i.d. baseline, as our experiments illustrate. |
| Researcher Affiliation | Collaboration | Hubert Eichner 1 Tomer Koren 1 H. Brendan Mc Mahan 1 Nathan Srebro 2 Kunal Talwar 1 1Google. 2Toyota Technological Institute at Chicago. Part of this work was done while NS was visiting Google. |
| Pseudocode | No | The paper describes algorithms through mathematical equations and textual explanations, but it does not include a distinct pseudocode block or an algorithm box. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | To illustrate the challenges of optimizing on block-cyclic data, we train and evaluate a small logistic regression model on the Sentiment140 Twitter dataset (Go et al., 2009), a binary classification task over 1.6 106 examples. |
| Dataset Splits | Yes | We split the data into training (90%) and test (10%) sets, partition it into m = 6 components based on the timestamps (but not dates) of the Tweets: 12am 4am, 4am 8am, etc, then divide each component across K = 10 cycles (days). |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., CPU, GPU, memory, or specific machine names). |
| Software Dependencies | No | The paper does not provide specific software names with version numbers that would be necessary for reproduction. It mentions using 'logistic regression model' and 'SGD' but no specific libraries or versions. |
| Experiment Setup | Yes | For simplicity, we keep the model architecture (linear bag of words classifier over top 1024 words) and minibatch size (128) fixed; we used a learning rate of η = 0.464 (determined through log-space grid search) except for the per-component SGD approach (8) where η = 1.0 was optimal due to fewer iterations. |