reproducibilityindex.ai

Fitting trees to $\ell_1$-hyperbolic distances

Authors: Joon-Hyeok Yim, Anna Gilbert

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we run HCCROOTEDTREEFIT on several different type of data sets, those that are standard for geometric graph neural nets and those that are synthetic. We also compare our results with other known algorithms. We conclude that HCCROOTEDTREEFIT (HCC) is optimal when the data sets are close to tree-like and when we measure with respect to distortion in the ℓ1 sense and running time. It is, however, suboptimal in terms of the ℓ measure of distortion (as to be expected). We also conclude that purportedly hierarchical data sets do not, in fact, embed into trees with low distortion, suggesting that geometric graph neural nets should be configured with different geometric considerations. Appendix 6.9 contains further details regarding the experiments. Table 2: Connection between hyperbolicity and average hyperbolicity and how these quantities determine the average distortion ( d d T 1/ n 2 ) of the resulting tree metric.
Researcher Affiliation	Academia	The provided text on the first page lists the authors as 'Joon-Hyeok Yim Anna C. Gilbert' and mentions '37th Conference on Neural Information Processing Systems (NeurIPS 2023)'. However, no explicit university names, company names, or email domains are stated in the provided text to definitively determine the affiliation type. Given that NeurIPS is a prominent academic conference, the affiliation is inferred to be academic.
Pseudocode	Yes	Algorithm 1 ISHIGHLYCONNECTED: tests if two clusters are highly connected. Algorithm 2 HCCTRIANGLE: Find HCC which respects triangle objectives. Algorithm 3 HCCULTRAFIT: Ultrametric fitting algorithm, uses HCCTRIANGLE. Algorithm 4 Find a tree fitting given an ultrametric fitting procedure.
Open Source Code	No	The paper mentions adapting code from TREEREP, stating 'We adapted code from TREEREP, which is available at https://github.com/rsonthal/Tree Rep.' However, it does not provide an explicit statement or link for the open-source code of their own methodology (HCCROOTEDTREEFIT or related algorithms).
Open Datasets	Yes	The data sets we used are C-ELEGAN, CS PHD from [14], and CORA, AIRPORT from [15]. (For those which contain multiple components, we chose the largest connected component.) Given an unweighted graph, we computed its shortest-path distance matrix and used that input to obtain a tree metric.
Dataset Splits	No	The paper does not explicitly state the use of a validation dataset split, nor does it provide specific percentages or sample counts for training, validation, and test splits. It mentions running algorithms multiple times (e.g., '100 times' or '50 trials') but not specific data partitioning for validation.
Hardware Specification	Yes	All experiments have used shared-use computing resource. There are 4 CPU cores each with 5Gi B memory. We executed all the code using a Jupyter notebook interface running Python 3.11.0.
Software Dependencies	Yes	All experiments have used shared-use computing resource... We executed all the code using a Jupyter notebook interface running Python 3.11.0 with numpy 1.24.2, scipy 1.10.0, and networkx 3.0.
Experiment Setup	Yes	As TREEREP is a randomized algorithm and HCCROOTEDTREEFIT and Gromov s algorithm depends on the choice of a pivot vertex, we run all of these algorithms 100 times and report the average error with standard deviation. All edges with negative weights have been modified with weight 0, as TREEREP and NEIGHBORJOIN both occasionally produce edges with negative weights. For synthetic data set experiments, we run nseed = 50 times. For the comparison, we have fixed the randomized seed using numpy library (from 0 to nseed 1).