Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

OmniFC: Rethinking Federated Clustering via Lossless and Secure Distance Reconstruction

Authors: Jie Yan, Jing Liu, Zhong-Yuan Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Theoretical analysis confirms both reconstruction fidelity and privacy guarantees, while comprehensive experiments demonstrate Omni FC s superior robustness, effectiveness, and generality across various benchmarks compared to state-of-the-art methods.
Researcher Affiliation Academia Central University of Finance and Economics
Pseudocode Yes A Pseudocode of the Proposed Omni FC The procedure of Omni FC is formally presented in Algorithm 1. On the client side, each sample xi is independently encoded into zi,j based on Equation (3), and then transmitted to the j-th client, where i [n] and j [m]. Then, each client j computes pairwise distances between all encoded representations zi,j and zi ,j (i, i [n]) using Equation (4), and transmits the results to the central server. On the server side, the global distance matrix is reconstructed based on Equations (6) and (7), and subsequently utilized by a centralized clustering algorithm to derive the final clustering outcome π . Algorithm 1: Omni FC
Open Source Code No Code will be released.
Open Datasets Yes The proposed Omni FC is assessed using seven benchmark datasets across tabular, visual, temporal, and genomic domains, including Iris [32], MNIST [33], Fashion-MNIST [34], COIL-20 [35], COIL-100 [35], Pendigits [36], and 10x_73k [37].
Dataset Splits Yes Following Ref. [41, 20], we simulate diverse federated settings by partitioning the real-world dataset into k subsets each representing a client and adjusting the non-IID level p, where k denotes the number of true clusters. Specifically, for each client, a fraction p of its data is sampled from a single cluster, while the remaining 1 p portion is drawn uniformly across all clusters.
Hardware Specification Yes All experiments are implemented in Python and executed on a system equipped with an Intel Core i7-12650H CPU, 16GB of RAM, and an NVIDIA Ge Force RTX 4060 GPU.
Software Dependencies Yes All centralized clustering methods are implemented by leveraging existing open-source Python libraries: KM, KMed, SC, NMF, and DBSCAN utilize the sklearn library [43], HC employs the scipy library [44], and FCM adopts an individual open-source implementation [45].
Experiment Setup Yes Implementation details are provided in Appendix C.2... For Omni FC, {αo}l+t o=1 is set as a sequence of l + t consecutive odd integers starting from 1, while {βj}m j=1 is set as a sequence of m consecutive even integers starting from 0. The default values of l and t are set to 2.