HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

Authors: Ines Chami, Albert Gu, Dat P Nguyen, Christopher Re

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now validate the empirical benefits of HOROPCA on three PCA uses. First, (i) we show that it yields much lower distortion and higher explained variance than existing methods, reducing average distortion by up to 77%. Second, (ii) we validate that it can be used for data pre-processing, improving downstream classification by up to 3.8% in Average Precision score compared to methods that don t use whitening. Finally, (iii) we show that the low-dimensional representations learned by HOROPCA can be visualized to qualitatively interpret hyperbolic data.
Researcher Affiliation Academia Ines Chami * 1 Albert Gu * 1 Dat Nguyen * 1 Christopher R e 1 ... 1Stanford University, CA, USA. Correspondence to: Ines Chami <chami@cs.stanford.edu>.
Pseudocode No No structured pseudocode or algorithm blocks found. The paper describes the algorithm in paragraph text.
Open Source Code Yes We open-source our implementation4 and refer to Appendix C for implementation details on how we implemented all baselines and HOROPCA. 4https://github.com/HazyResearch/HoroPCA
Open Datasets Yes For dimensionality reduction experiments, we consider standard hierarchical datasets previously used to evaluate the benefits of hyperbolic embeddings. More specifically, we use the datasets in (Sala et al., 2018) including a fully balanced tree, a phylogenetic tree, a biological graph comprising of diseases relationships and a graph of Computer Science (CS) Ph.D. advisor-advisee relationships. and For data whitening experiments, we reproduce the experimental setup from (Cho et al., 2019) and use the Polbooks, Football and Polblogs datasets which have 105, 115 and 1224 nodes each.
Dataset Splits Yes We reproduce the experimental setup from (Cho et al., 2019) who split the datasets in 50% train and 50% test sets, run classification on 2-dimensional embeddings and average results over 5 different embedding configurations as was done in the original paper (Table 3).
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or cloud instance types) used for running the experiments were found in the paper.
Software Dependencies No No specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4), were found in the paper.
Experiment Setup No No specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for the experiments were found in the main text. The paper mentions reproducing the setup from a prior work and high-level steps for whitening and evaluation.