Katyusha X: Simple Momentum Method for Stochastic Sum-of-Nonconvex Optimization

Authors: Zeyuan Allen-Zhu

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1: A simple illustration on minimizing f(x) = 1 2x (µI BB where B R1000 1000 is a random 1 matrix, and µ = λ1(BB ) + 0.5 λ1(BB ) λ2(BB ) . Such f(x) is a typical instance in stochastic PCA (Garber et al., 2016). Remark 1. In SVRG, the best learning rate is η = 0.4/L after tuning. Remark 2. We used η = 0.4/L for Katyusha Xw. We used η = 0.4/L and τ = 0.1 for Katyusha Xs. Remark 3. In the mini-batch experiment, we used η = 0.4b L and τ = 0.1. The parallel speed-up is in terms of achieving objective error 10 3, 10 5, 10 7, 10 9.
Researcher Affiliation Collaboration Microsoft Research AI. Correspondence to: Zeyuan Allen Zhu <zeyuan@csail.mit.edu>.
Pseudocode No The paper describes algorithms with updates in text but does not include a formally labeled 'Algorithm' or 'Pseudocode' block.
Open Source Code No Full and future versions can be found on https://arxiv. org/abs/1802.03866. This link is to the arXiv paper itself, not source code.
Open Datasets No A simple illustration on minimizing f(x) = 1 2x (µI BB where B R1000 1000 is a random 1 matrix, and µ = λ1(BB ) + 0.5 λ1(BB ) λ2(BB ) . Such f(x) is a typical instance in stochastic PCA (Garber et al., 2016). This describes a synthetic dataset without providing access information.
Dataset Splits No The paper uses a synthetic dataset for illustration but does not specify any train/validation/test splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes Remark 1. In SVRG, the best learning rate is η = 0.4/L after tuning. Remark 2. We used η = 0.4/L for Katyusha Xw. We used η = 0.4/L and τ = 0.1 for Katyusha Xs. Remark 3. In the mini-batch experiment, we used η = 0.4b L and τ = 0.1.