HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections
Authors: Ines Chami, Albert Gu, Dat P Nguyen, Christopher Re
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now validate the empirical benefits of HOROPCA on three PCA uses. First, (i) we show that it yields much lower distortion and higher explained variance than existing methods, reducing average distortion by up to 77%. Second, (ii) we validate that it can be used for data pre-processing, improving downstream classification by up to 3.8% in Average Precision score compared to methods that don t use whitening. Finally, (iii) we show that the low-dimensional representations learned by HOROPCA can be visualized to qualitatively interpret hyperbolic data. |
| Researcher Affiliation | Academia | Ines Chami * 1 Albert Gu * 1 Dat Nguyen * 1 Christopher R e 1 ... 1Stanford University, CA, USA. Correspondence to: Ines Chami <chami@cs.stanford.edu>. |
| Pseudocode | No | No structured pseudocode or algorithm blocks found. The paper describes the algorithm in paragraph text. |
| Open Source Code | Yes | We open-source our implementation4 and refer to Appendix C for implementation details on how we implemented all baselines and HOROPCA. 4https://github.com/HazyResearch/HoroPCA |
| Open Datasets | Yes | For dimensionality reduction experiments, we consider standard hierarchical datasets previously used to evaluate the benefits of hyperbolic embeddings. More specifically, we use the datasets in (Sala et al., 2018) including a fully balanced tree, a phylogenetic tree, a biological graph comprising of diseases relationships and a graph of Computer Science (CS) Ph.D. advisor-advisee relationships. and For data whitening experiments, we reproduce the experimental setup from (Cho et al., 2019) and use the Polbooks, Football and Polblogs datasets which have 105, 115 and 1224 nodes each. |
| Dataset Splits | Yes | We reproduce the experimental setup from (Cho et al., 2019) who split the datasets in 50% train and 50% test sets, run classification on 2-dimensional embeddings and average results over 5 different embedding configurations as was done in the original paper (Table 3). |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or cloud instance types) used for running the experiments were found in the paper. |
| Software Dependencies | No | No specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4), were found in the paper. |
| Experiment Setup | No | No specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for the experiments were found in the main text. The paper mentions reproducing the setup from a prior work and high-level steps for whitening and evaluation. |