Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity

Authors: Bingqing Song, Ioannis Tsaknakis, Chung-Yiu Yau, Hoi-To Wai, Mingyi Hong

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Preliminary Numerical Experiments. We conclude by presenting a numerical experiment for the UCI Tom’s Hardware dataset using Alg. 2 where we applied blog(t+100)c rounds of communication at the t-th iteration for the CHOCO-GOSSIP subroutine; see Appendix F.1. We consider a ring network with K = 5 agents, each one has 500 or 1000 samples (thus making N = 2500, or N = 5000). We construct D-dimensional features from the dataset as NTK features [Bietti and Mairal, 2019]. In Fig. 1, we train a least square regression model in the overparameterized regime.
Researcher Affiliation Academia Bingqing Song Department of ECE University of Minnesota email:song0409@umn.edu Ioannis Tsaknakis Department of ECE University of Minnesota email:tsakn001@umn.edu Chung-Yiu Yau Department of SEEM Chinese University of Hong Kong email:cyyau@se.cuhk.edu.hk Hoi-To Wai Department of SEEM Chinese University of Hong Kong email:htwai@cuhk.edu.hk Mingyi Hong Department of ECE University of Minnesota email:mhong@umn.edu
Pseudocode Yes Algorithm 1 Limited Communication Distributed Optimization Algorithm ... Algorithm 2 Decentralized Gradient Descent with Compressed Comm. via Linear Compression
Open Source Code No The paper does not provide any explicit statements about making its source code open, nor does it include a link to a code repository for the described methodology.
Open Datasets Yes Preliminary Numerical Experiments. We conclude by presenting a numerical experiment for the UCI Tom’s Hardware dataset using Alg. 2 where we applied blog(t+100)c rounds of communication at the t-th iteration for the CHOCO-GOSSIP subroutine; see Appendix F.1. We consider a ring network with K = 5 agents, each one has 500 or 1000 samples (thus making N = 2500, or N = 5000). We construct D-dimensional features from the dataset as NTK features [Bietti and Mairal, 2019]. In Fig. 1, we train a least square regression model in the overparameterized regime. Available: https://archive.ics.uci.edu/ml/datasets/Buzz+in+social+media+
Dataset Splits No The paper mentions training a model on the dataset but does not specify details regarding train/validation/test splits or cross-validation setup for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes We consider a ring network with K = 5 agents, each one has 500 or 1000 samples (thus making N = 2500, or N = 5000). ... we applied blog(t+100)c rounds of communication at the t-th iteration for the CHOCO-GOSSIP subroutine.