Faster federated optimization under second-order similarity

Authors: Ahmed Khaled, Chi Jin

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS We run linear regression with ℓ2 regularization, where each client has a loss function of the form... We do two sets of experiments: in the first set, we generate the data vectors zm,i synthetically... In the second set, we use the a9a dataset from LIBSVM... Our results are given in Figure 1.
Researcher Affiliation Academia Ahmed Khaled Princeton University Chi Jin Princeton University
Pseudocode Yes Algorithm 1: Stochastic Proximal Point Method (SPPM) Data: Stepsize η, initialization x0, number of steps K, proximal solution accuracy b. 1 for k = 0, 1, 2, . . . , K 1 do...
Open Source Code Yes We attach the code used to run the experiments as supplementary material to the paper.
Open Datasets Yes In the second set, we use the a9a dataset from LIBSVM (Chang & Lin, 2011)
Dataset Splits No The paper states using synthetic data and the a9a dataset from LIBSVM, and that 'each client s data is constructed by sampling from the original training dataset with n = 2000 samples per client.' However, it does not provide specific train/validation/test split percentages, sample counts for splits, or a methodology for creating these splits to ensure reproducibility.
Hardware Specification No We simulate our results on a single machine, running each method for 10000 communication steps. No specific hardware details (e.g., CPU, GPU model, memory) are provided.
Software Dependencies No In the second set, we use the a9a dataset from LIBSVM (Chang & Lin, 2011). The paper mentions LIBSVM but does not provide specific version numbers for it or any other software dependencies used in the experiments.
Experiment Setup Yes We run linear regression with ℓ2 regularization, where each client has a loss function of the form... with regularization constant λ = 1... and set the regularization parameter as λ = 0.1... We simulate our results on a single machine, running each method for 10000 communication steps. ...each client s data is constructed by sampling from the original training dataset with n = 2000 samples per client. We compare SVRP against SVRG, SCAFFOLD, and the Accelerated Extragradient algorithms, using the optimal theoretical stepsize for each algorithm.