Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis

Authors: Dachao Lin, Yuze Han, Haishan Ye, Zhihua Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the advantages of our algorithms, we conduct the same numerical experiments as those in [35, 33]. We focus on the linear ridge regression problem with ℓ2 regularization, where the average loss f has the formulation: f(x) = 1/n Pn i=1 fi(x) := 1/m Pm j=1 z i,jx yi,j 2 + µ/2 x 2. Here zi,j Rd and yi,j R, i [n], j [m] serve as the feature and label respectively, and m can be viewed as data size in each local client. We consider a synthetic dataset generated by adding a small random noise matrix to the center matrix, ensuring a small δ. To capture the differences in convergence rates between our methods and SVRP caused by different magnitudes of µ, we vary µ = 10 i, i {0, 1, 2}. We compare our methods (SVRS and Acc SVRS) against SVRG, Katyusha X, SVRP (Catalyzed SVRP is somehow hard to tune so we omit it), and Accelerated Extragradient (Acc EG) using their theoretical step sizes, except that we scale the interpolation parameter τ in Katyusha X and Acc SVRS for producing practical performance (see Appendix G for detail). From Figure 1, we can observe that for a large µ, SVRP outperforms existing algorithms due to its high-order dependence on µ. However, when the problem becomes ill-conditioned with a small µ, Acc SVRS exhibits significant improvements compared to other algorithms.
Researcher Affiliation Collaboration Dachao Lin Yuze Han Haishan Ye Zhihua Zhang Equal Contribution. Academy for Advanced Interdisciplinary Studies; Peking University; lindachao@pku.edu.cn; School of Mathematical Sciences; Peking University; hanyuze97@pku.edu.cn; Corresponding Author; School of Management; Xi an Jiaotong University; SGIT AI Lab, State Grid Corporation of China; yehaishan@xjtu.edu.cn; School of Mathematical Sciences; Peking University; zhzhang@math.pku.edu.cn.
Pseudocode Yes Algorithm 1 SVRS1ep(f, w0, θ, p) Algorithm 2 Accelerated SVRS (Acc SVRS)
Open Source Code No The paper does not provide any statement about releasing source code or links to a code repository for the methodology described.
Open Datasets No The paper states: 'We consider a synthetic dataset generated by adding a small random noise matrix to the center matrix, ensuring a small δ.' However, it does not provide any concrete access information (link, DOI, formal citation) for this synthetic dataset.
Dataset Splits No The paper mentions conducting numerical experiments on a synthetic dataset but does not specify any training, validation, or test splits (e.g., percentages or sample counts).
Hardware Specification No The paper describes numerical experiments on synthetic data, but it does not provide any specific details about the hardware used (e.g., GPU models, CPU types, or cloud resources).
Software Dependencies No The paper mentions comparing methods like SVRG, Katyusha X, SVRP, and Acc EG, but it does not specify any software dependencies (e.g., programming languages, libraries, or solvers) with version numbers.
Experiment Setup Yes To capture the differences in convergence rates between our methods and SVRP caused by different magnitudes of µ, we vary µ = 10 i, i {0, 1, 2}. We compare our methods (SVRS and Acc SVRS) against SVRG, Katyusha X, SVRP (Catalyzed SVRP is somehow hard to tune so we omit it), and Accelerated Extragradient (Acc EG) using their theoretical step sizes, except that we scale the interpolation parameter τ in Katyusha X and Acc SVRS for producing practical performance (see Appendix G for detail).