reproducibilityindex.ai

HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

Authors: Ines Chami, Albert Gu, Dat P Nguyen, Christopher Re

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now validate the empirical beneﬁts of HOROPCA on three PCA uses. First, (i) we show that it yields much lower distortion and higher explained variance than existing methods, reducing average distortion by up to 77%. Second, (ii) we validate that it can be used for data pre-processing, improving downstream classiﬁcation by up to 3.8% in Average Precision score compared to methods that don t use whitening. Finally, (iii) we show that the low-dimensional representations learned by HOROPCA can be visualized to qualitatively interpret hyperbolic data.
Researcher Affiliation	Academia	Ines Chami * 1 Albert Gu * 1 Dat Nguyen * 1 Christopher R e 1 ... 1Stanford University, CA, USA. Correspondence to: Ines Chami <chami@cs.stanford.edu>.
Pseudocode	No	No structured pseudocode or algorithm blocks found. The paper describes the algorithm in paragraph text.
Open Source Code	Yes	We open-source our implementation4 and refer to Appendix C for implementation details on how we implemented all baselines and HOROPCA. 4https://github.com/HazyResearch/HoroPCA
Open Datasets	Yes	For dimensionality reduction experiments, we consider standard hierarchical datasets previously used to evaluate the beneﬁts of hyperbolic embeddings. More speciﬁcally, we use the datasets in (Sala et al., 2018) including a fully balanced tree, a phylogenetic tree, a biological graph comprising of diseases relationships and a graph of Computer Science (CS) Ph.D. advisor-advisee relationships. and For data whitening experiments, we reproduce the experimental setup from (Cho et al., 2019) and use the Polbooks, Football and Polblogs datasets which have 105, 115 and 1224 nodes each.
Dataset Splits	Yes	We reproduce the experimental setup from (Cho et al., 2019) who split the datasets in 50% train and 50% test sets, run classiﬁcation on 2-dimensional embeddings and average results over 5 different embedding conﬁgurations as was done in the original paper (Table 3).
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or cloud instance types) used for running the experiments were found in the paper.
Software Dependencies	No	No specific ancillary software details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4), were found in the paper.
Experiment Setup	No	No specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings for the experiments were found in the main text. The paper mentions reproducing the setup from a prior work and high-level steps for whitening and evaluation.