Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Convex Relaxation for Solving Large-Margin Classifiers in Hyperbolic Space
Authors: Sheng Yang, Peihan Liu, Cengiz Pehlevan
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | From extensive empirical experiments, these methods are shown to achieve better classification accuracies than the projected gradient descent approach in most of the synthetic and real two-dimensional hyperbolic embedding dataset under the one-vs-rest multiclass-classification scheme. |
| Researcher Affiliation | Academia | Sheng Yang EMAIL John A. Paulson School of Engineering and Applied Sciences Harvard University Peihan Liu EMAIL John A. Paulson School of Engineering and Applied Sciences Harvard University Cengiz Pehlevan EMAIL John A. Paulson School of Engineering and Applied Sciences Center for Brain Science Kempner Institute for the Study of Natural and Artificial Intelligence Harvard University |
| Pseudocode | No | The paper provides detailed mathematical formulations and transformations of the problem (e.g., Equation (7), (8), (13), (14)), but it does not include any clearly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | The code to our implmentations is https://github.com/yangshengaa/hsvm-relax. |
| Open Datasets | Yes | Regarding real datasets, our experiments include two machine learning benchmark datasets, CIFAR-10 Krizhevsky et al. (2009) and Fashion-MNIST Xiao et al. (2017) with their hyperbolic embeddings obtained through standard hyperbolic embedding procedure (Chien et al., 2021; Khrulkov et al., 2020; Klimovskaia et al., 2020) to assess image classification performance. Additionally, we incorporate three graph embedding datasets, such as football, karate, and polbooks obtained from Chien et al. (2021), to evaluate the effectiveness of our methods on graph-structured data. We also explore cell embedding datasets, including Paul Myeloid Progenitors developmental dataset (Paul et al., 2015), Olsson Single-Cell RNA sequencing dataset (Olsson et al., 2016), Krumsiek Simulated Myeloid Progenitors dataset(Krumsiek et al., 2011), and Moignard blood cell developmental trace dataset from single-cell gene expression (Moignard et al., 2015) |
| Dataset Splits | Yes | The primary metrics for assessing model performance are average training and testing loss, accuracy, and weighted F1 score under a stratified 5-fold train-test split scheme. |
| Hardware Specification | Yes | All experiments are run and timed on a machine with 8 Intel Broadwell/Ice Lake CPUs and 40GB of memory. |
| Software Dependencies | Yes | We use MOSEK (Ap S, 2022) in Python as our optimization solver without any intermediate parser...Our Python code also uses some common publicly available packages, including Num Py (Harris et al., 2020)... Matplotlib (Hunter, 2007)... Pandas (Mc Kinney et al., 2010) under a BSD license, scikit-learn (Pedregosa et al., 2011) |
| Experiment Setup | Yes | The PGD implementation follows from adapting the MATLAB code in Cho et al. (2019), with learning rate 0.001 and 2000 epochs for synthetic and 4000 epochs for real dataset and warm-started with a Euclidean SVM solution... We first report performances of three models using one-vs-rest training scheme, described in Appendix D, in Tables 6 to 8 for C {0.1, 1.0, 10} respectively |