Matrix factorisation and the interpretation of geodesic distance

Authors: Nick Whiteley, Annie Gray, Patrick Rubin-Delanchy

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our mathematical results with experimental evidence, obtained from both simulated and real data, suggesting that alternative combinations of matrix-factorisation and dimension-reduction techniques work too. For the former, we consider the popular node2vec [20] algorithm, a likelihood-based approach for graphs said to perform matrix factorisation implicitly [43, 64]. For the latter, we consider the popular t-SNE [37] and UMAP [39] algorithms. As predicted by the theory, a direct low-dimensional matrix factorisation, whether using spectral embedding or node2vec, is less successful.
Researcher Affiliation Academia Nick Whiteley University of Bristol nick.whiteley@bristol.ac.uk Annie Gray University of Bristol annie.gray@bristol.ac.uk Patrick Rubin-Delanchy University of Bristol patrick.rubin-delanchy@bristol.ac.uk
Pseudocode Yes Algorithm 1 Isomap procedure input p-dimensional points ˆX1, . . . , ˆXn 1: Compute the neighbourhood graph of radius : a weighted graph connecting i and j, with weight k ˆXi ˆXjk, if k ˆXi ˆXjk 2: Compute the matrix of shortest paths on the neighbourhood graph, ˆDM 2 Rn n 3: Apply classical multidimensional scaling (CMDS) to ˆDM into Rd return d-dimensional points ˆZ1, . . . , ˆZn
Open Source Code Yes See Section 4 and https://github.com/anniegray52/graphs.
Open Datasets Yes For this example, we use a publically available, clean version [52] (with license details therein) of the crowdsourced Open Sky dataset... These data are open source, originate from Berkeley Earth [1] and the particular data set analyzed is from [3]. We used open source latitude and longitude data from [4].
Dataset Splits No The paper does not explicitly mention train, validation, or test dataset splits in terms of percentages or sample counts for experimental reproduction.
Hardware Specification No The experiments take a few hours on a standard desktop (and the same is a loose upper bound on compute time for the other experiments in the paper).
Software Dependencies No The R packages irlba and RSpectra provide fast solutions which can exploit sparse inputs... The method of [65], based on profile-likelihood, provides a popular, practical choice, taking as input the spectrum of A, and is implemented in the R package igraph.
Experiment Setup Yes Focusing first on the dense regime, for each n = 100, 400, 1600, 6400, we simulate a graph, and its spectral embedding ˆX1, . . . , ˆXn into p = 5 dimensions. The first three dimensions of this embedding are shown in Figure 2a), and we see that the points gather about a two-dimensional, curved, manifold. To approximate geodesic distance, we compute a neighbourhood graph of radius from ˆX1, . . . , ˆXn, choosing as small as possible subject to maintaining a connected graph... We use default configurations for all other hyperparameters.