reproducibilityindex.ai

Hyperbolic Diffusion Embedding and Distance for Hierarchical Representation Learning

Authors: Ya-Wei Eileen Lin, Ronald R. Coifman, Gal Mishne, Ronen Talmon

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experimental Results We investigate the proposed HDE and HDD in hierarchical graph embedding and distance recovery contexts. Specifically, we apply it to (i) several graphs serving as benchmarks for hierarchical graph embedding, (ii) single-cell gene expression data for the recovery of the hidden hierarchical structure, and (iii) unsupervised hierarchical metric learning tasks.
Researcher Affiliation	Academia	1Viterbi Faculty of Electrical and Computer Engineering, Technion, Haifa, Israel 2Department of Mathematics, Yale University, New Haven, CT, USA 3Halicioˇglu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
Pseudocode	Yes	Algorithm 1 Hyperbolic Diffusion Embedding and Distance
Open Source Code	Yes	The code is available at the link https://github.com/ Ya-Wei0/Hyperbolic Diffusion Distance.
Open Datasets	Yes	We test two sc RNA-seq datasets taken from (Dumitrascu et al., 2021): (i) the mouse cortex and hippocampus dataset (Zeisel) consisting of 3005 single-cells with seven cell types and 4000 gene markers (Zeisel et al., 2015), and (ii) the cord blood mononuclear cell study (CBMC) comprising 8617 single-cells with 13 cell types and 500 gene markers (Stoeckius et al., 2017). and We consider four datasets from the UCI Machine Learning Repository (Dua & Graff, 2017): (i) the Zoo dataset consisting of 101 data points of seven types of animals with 17 features, (ii) the Iris dataset comprising 150 samples from three kinds of Iris plants with four features, (iii) the Glass dataset containing 214 instances of six classes with 10 features, and (iv) the image segmentation (Ima Seg) dataset consisting of 2310 instances from seven outdoor images with 19 features.
Dataset Splits	Yes	The reported classification accuracy is obtained by averaging over ten different runs; in each run, the dataset is randomly split into 80% training set and 20% testing set. and We use cross-validation with ten repetitions, in which the dataset is randomly divided into 80% training set and 20% testing set.
Hardware Specification	Yes	The experiments are performed on NVIDIA GTX 1080 Ti GPU.
Software Dependencies	No	We use the Py Torch code in (Gu et al., 2018) for Poincar e embedding (PE)... We use the Py Torch code of an SGD-based algorithm... We use the Py Torch code of an SGD-based algorithm... and A distance based on the cosine similarity (sklearn.metrics.pairwise distances) computed in the ambient space is used in Eq. (1)... It mentions software names like “Py Torch” and “sklearn” but does not specify version numbers.
Experiment Setup	Yes	For this purpose, we apply Algorithm 1 with α = 1/2 and K ∈ {0, 1, . . . , 19}. and the parameter α = 1/2 and the maximal scale K ∈ {0, 1, . . . , 19}.