Matrix factorisation and the interpretation of geodesic distance
Authors: Nick Whiteley, Annie Gray, Patrick Rubin-Delanchy
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our mathematical results with experimental evidence, obtained from both simulated and real data, suggesting that alternative combinations of matrix-factorisation and dimension-reduction techniques work too. For the former, we consider the popular node2vec [20] algorithm, a likelihood-based approach for graphs said to perform matrix factorisation implicitly [43, 64]. For the latter, we consider the popular t-SNE [37] and UMAP [39] algorithms. As predicted by the theory, a direct low-dimensional matrix factorisation, whether using spectral embedding or node2vec, is less successful. |
| Researcher Affiliation | Academia | Nick Whiteley University of Bristol nick.whiteley@bristol.ac.uk Annie Gray University of Bristol annie.gray@bristol.ac.uk Patrick Rubin-Delanchy University of Bristol patrick.rubin-delanchy@bristol.ac.uk |
| Pseudocode | Yes | Algorithm 1 Isomap procedure input p-dimensional points ˆX1, . . . , ˆXn 1: Compute the neighbourhood graph of radius : a weighted graph connecting i and j, with weight k ˆXi ˆXjk, if k ˆXi ˆXjk 2: Compute the matrix of shortest paths on the neighbourhood graph, ˆDM 2 Rn n 3: Apply classical multidimensional scaling (CMDS) to ˆDM into Rd return d-dimensional points ˆZ1, . . . , ˆZn |
| Open Source Code | Yes | See Section 4 and https://github.com/anniegray52/graphs. |
| Open Datasets | Yes | For this example, we use a publically available, clean version [52] (with license details therein) of the crowdsourced Open Sky dataset... These data are open source, originate from Berkeley Earth [1] and the particular data set analyzed is from [3]. We used open source latitude and longitude data from [4]. |
| Dataset Splits | No | The paper does not explicitly mention train, validation, or test dataset splits in terms of percentages or sample counts for experimental reproduction. |
| Hardware Specification | No | The experiments take a few hours on a standard desktop (and the same is a loose upper bound on compute time for the other experiments in the paper). |
| Software Dependencies | No | The R packages irlba and RSpectra provide fast solutions which can exploit sparse inputs... The method of [65], based on profile-likelihood, provides a popular, practical choice, taking as input the spectrum of A, and is implemented in the R package igraph. |
| Experiment Setup | Yes | Focusing first on the dense regime, for each n = 100, 400, 1600, 6400, we simulate a graph, and its spectral embedding ˆX1, . . . , ˆXn into p = 5 dimensions. The first three dimensions of this embedding are shown in Figure 2a), and we see that the points gather about a two-dimensional, curved, manifold. To approximate geodesic distance, we compute a neighbourhood graph of radius from ˆX1, . . . , ˆXn, choosing as small as possible subject to maintaining a connected graph... We use default configurations for all other hyperparameters. |