Variance-Reduced Stochastic Gradient Descent on Streaming Data
Authors: Ellango Jothimurugesan, Ashraf Tahmasbi, Phillip Gibbons, Srikanta Tirthapura
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical and experimental results show that the risk of STRSAGA is comparable to that of an offline algorithm on a variety of input arrival patterns, and its experimental performance is significantly better than prior algorithms suited for streaming data, such as SGD and SSVRG. |
| Researcher Affiliation | Academia | Ellango Jothimurugesan Carnegie Mellon University ejothimu@cs.cmu.edu Ashraf Tahmasbi Iowa State University tahmasbi@iastate.edu Phillip B. Gibbons Carnegie Mellon University gibbons@cs.cmu.edu Srikanta Tirthapura Iowa State University snt@iastate.edu |
| Pseudocode | Yes | Algorithm 1 depicts the steps taken to process the zero or more points Xi arriving at time step i. |
| Open Source Code | No | No explicit statement or link providing access to the open-source code for the methodology described in this paper was found. |
| Open Datasets | Yes | For logistic regression, we use the A9A [DKT17] and RCV1.binary [LYRL04] datasets, and for matrix factorization, we use two datasets of user-item ratings from Movielens [HK16]. More detail on the datasets are provided in the supplementary material. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits (e.g., percentages or sample counts) in the main text. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory) used for the experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned in the paper. |
| Experiment Setup | Yes | In our experiments, the training data arrives over the course of 100 time steps, with skewed arrivals parameterized by M = 8λ. At each time step i, a streaming data algorithm has access to ρ gradient computations to update the model; we show results for ρ/λ = 1 and ρ/λ = 5. |