Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Authors: Saurav Prakash, Jin Sima, Chao Pan, Eli Chien, Olgica Milenkovic

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our method on a collection of diverse data sets, including hierarchical single-cell RNA-seq data from different patients distributed across different repositories that have stringent privacy constraints. The classification accuracy of our method is up to 11% better than its Euclidean counterpart, demonstrating the importance of privacy-preserving learning in hyperbolic spaces. Our implementation for the proposed method is available at https://github.com/sauravpr/hyperbolic_federated_classification.
Researcher Affiliation	Academia	Saurav Prakash EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Jin Sima EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Chao Pan EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Eli Chien EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Olgica Milenkovic EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign
Pseudocode	Yes	Algorithm 1 Poincaré Graham Scan... Algorithm 2 CCW... Algorithm 3 Poincaré Uniform Sampling
Open Source Code	Yes	Our implementation for the proposed method is available at https://github.com/sauravpr/hyperbolic_federated_classification.
Open Datasets	Yes	We consider multi-label SVM classification for three biological data sets: Olsson s sc RNAseq data set (Olsson et al., 2016), UC-Stromal data set (Smillie et al., 2019), and Lung-Human data set (Vieira Braga et al., 2019). Our simulations are performed on the Poincaré embeddings of these data sets, which can be obtained using methods described in (Klimovskaia et al., 2020; Skopek et al., 2020). We illustrate the embeddings in Figure 8, which is also included in Appendix I. For simulating the FL multilabel classification tasks, we use subsets of data corresponding to different combinations of labels, for both the UC-Stromal and the Lung-Human data sets. Since the Olsson data set is quite small (it contains only 319 points from 8 classes), we consider the entire data set for multi-label classification. Detailed information about the data sets and experimental settings is available in Appendix I.
Dataset Splits	Yes	We use a 90%/10% random split for each data set to obtain training and test points. For each biological data set, we consider a 85%/15% random split for the training and test points, and keep it fixed for all trials.
Hardware Specification	No	The paper does not explicitly describe the hardware specifications (e.g., specific CPU/GPU models, memory) used for running the experiments. It mentions simulations were performed but without hardware details.
Software Dependencies	No	We use the kernighan_lin_bisection() module from the Network X library (Hagberg et al., 2008) in Python to group the convex hulls. For multi-label classification, we use the Spectral Clustering() module from Scikit-learn (Pedregosa et al., 2011). The paper mentions software libraries but does not provide specific version numbers for Python, NetworkX, or Scikit-learn.
Experiment Setup	Yes	For both Euclidean and Poincaré SVMs, we set the regularization hyperparameter in (7) to λ = 20, 000, which essentially forces the solver to solve the hard margin SVM problem (6). For federated baselines, we consider L = 10, and partition the training data uniformly across clients. For both Euclidean and Poincaré SVMs, we consider a regularization term λ = 0.1. For federated baselines, we set L = 3, and partition the training data uniformly across clients. The default value for the quantization parameter (i.e., distance margin) is ϵ = 0.01.