Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls
Authors: Saurav Prakash, Jin Sima, Chao Pan, Eli Chien, Olgica Milenkovic
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on a collection of diverse data sets, including hierarchical single-cell RNA-seq data from different patients distributed across different repositories that have stringent privacy constraints. The classification accuracy of our method is up to 11% better than its Euclidean counterpart, demonstrating the importance of privacy-preserving learning in hyperbolic spaces. Our implementation for the proposed method is available at https://github.com/sauravpr/hyperbolic_federated_classification. |
| Researcher Affiliation | Academia | Saurav Prakash EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Jin Sima EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Chao Pan EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Eli Chien EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Olgica Milenkovic EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign |
| Pseudocode | Yes | Algorithm 1 Poincaré Graham Scan... Algorithm 2 CCW... Algorithm 3 Poincaré Uniform Sampling |
| Open Source Code | Yes | Our implementation for the proposed method is available at https://github.com/sauravpr/hyperbolic_federated_classification. |
| Open Datasets | Yes | We consider multi-label SVM classification for three biological data sets: Olsson s sc RNAseq data set (Olsson et al., 2016), UC-Stromal data set (Smillie et al., 2019), and Lung-Human data set (Vieira Braga et al., 2019). Our simulations are performed on the Poincaré embeddings of these data sets, which can be obtained using methods described in (Klimovskaia et al., 2020; Skopek et al., 2020). We illustrate the embeddings in Figure 8, which is also included in Appendix I. For simulating the FL multilabel classification tasks, we use subsets of data corresponding to different combinations of labels, for both the UC-Stromal and the Lung-Human data sets. Since the Olsson data set is quite small (it contains only 319 points from 8 classes), we consider the entire data set for multi-label classification. Detailed information about the data sets and experimental settings is available in Appendix I. |
| Dataset Splits | Yes | We use a 90%/10% random split for each data set to obtain training and test points. For each biological data set, we consider a 85%/15% random split for the training and test points, and keep it fixed for all trials. |
| Hardware Specification | No | The paper does not explicitly describe the hardware specifications (e.g., specific CPU/GPU models, memory) used for running the experiments. It mentions simulations were performed but without hardware details. |
| Software Dependencies | No | We use the kernighan_lin_bisection() module from the Network X library (Hagberg et al., 2008) in Python to group the convex hulls. For multi-label classification, we use the Spectral Clustering() module from Scikit-learn (Pedregosa et al., 2011). The paper mentions software libraries but does not provide specific version numbers for Python, NetworkX, or Scikit-learn. |
| Experiment Setup | Yes | For both Euclidean and Poincaré SVMs, we set the regularization hyperparameter in (7) to λ = 20, 000, which essentially forces the solver to solve the hard margin SVM problem (6). For federated baselines, we consider L = 10, and partition the training data uniformly across clients. For both Euclidean and Poincaré SVMs, we consider a regularization term λ = 0.1. For federated baselines, we set L = 3, and partition the training data uniformly across clients. The default value for the quantization parameter (i.e., distance margin) is ϵ = 0.01. |