Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Global Convergence of Federated Learning for Mixed Regression

Authors: Lili Su, Jiaming Xu, Pengkun Yang

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper studies the problem of model training under Federated Learning when clients exhibit cluster structure. We contextualize this problem in mixed regression, where each client has limited local data generated from one of k unknown regression models. We design an algorithm that achieves global convergence from any initialization, and works even when local data volume is highly unbalanced there could exist clients that contain O(1) data points only. Our algorithm ﬁrst runs moment descent on a few anchor clients (each with (k) data points) to obtain coarse model estimates. Then each client alternately estimates its cluster labels and reﬁnes the model estimates based on Fed Avg or Fed Prox. A key innovation in our analysis is a uniform estimate on the clustering errors, which we prove by bounding the VC dimension of general polynomial concept classes based on the theory of algebraic geometry.
Researcher Affiliation	Academia	Lili Su Electrical and Computer Engineering Northeastern University EMAIL Jiaming Xu The Fuqua School of Business Duke University EMAIL Pengkun Yang Center for Statistical Science Tsinghua University EMAIL
Pseudocode	Yes	Phase 1: Federated Moment Descent (Fed MD) and Phase 2: Fed X+clustering are detailed in the pseudocode in Phase 1 and Phase 2, respectively.
Open Source Code	No	The paper is theoretical and does not provide an explicit statement about releasing source code for the methodology described. In the 'Questions for Paper Analysis' section, it states '[N/A]' for questions regarding code availability.
Open Datasets	No	The paper is theoretical and does not conduct experiments involving datasets. Therefore, it does not provide concrete access information for a publicly available or open dataset. In the 'Questions for Paper Analysis' section, it states '[N/A]' for questions regarding data.
Dataset Splits	No	The paper is theoretical and does not conduct experiments, hence it does not specify dataset splits for training, validation, or testing. In the 'Questions for Paper Analysis' section, it states '[N/A]' for questions regarding training details like data splits.
Hardware Specification	No	The paper is theoretical and does not conduct experiments, thus no hardware specifications for running experiments are provided. In the 'Questions for Paper Analysis' section, it states '[N/A]' for questions regarding compute and resources used.
Software Dependencies	No	The paper is theoretical and does not conduct experiments, thus no specific software dependencies with version numbers for replication are provided. In the 'Questions for Paper Analysis' section, it states '[N/A]' for questions regarding compute and resources used.
Experiment Setup	No	The paper is theoretical and does not conduct experiments, hence no specific experimental setup details like hyperparameters or training configurations are provided. In the 'Questions for Paper Analysis' section, it states '[N/A]' for questions regarding training details.