Fast geometric learning with symbolic matrices

Authors: Jean Feydy, Alexis Glaunès, Benjamin Charlier, Michael Bronstein

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform an extensive evaluation on a broad class of problems: Gaussian modelling, K-nearest neighbors search, geometric deep learning, non Euclidean embeddings and optimal transport theory. In practice, for geometric problems that involve 103 to 106 samples in dimension 1 to 100, our library speeds up baseline GPU implementations by up to two orders of magnitude.
Researcher Affiliation Collaboration Jean Feydy* Imperial College London jfeydy@ic.ac.uk Joan Alexis Glaunès* Université de Paris alexis.glaunes@parisdescartes.fr Benjamin Charlier* Université de Montpellier benjamin.charlier@umontpellier.fr Michael M. Bronstein Imperial College London / Twitter m.bronstein@imperial.ac.uk
Pseudocode Yes Figure 2: We rely on fast parallel schemes to compute reductions of symbolic matrices, as in Eq. (1). (a) On the CPU, each thread i computes a value ai by looping over the reduction index j and consuming the values of F on-the-fly. (b) On the GPU, we cut (a) in K-by-K tiles (where K is the CUDA block size) to leverage the low latency of the shared memory buffer and block-wise memory accesses.
Open Source Code Yes We refer to our online documentation (www.kernel-operations.io)
Open Datasets Yes We perform numerical experiments with random normal samples and freely available datasets: digits from Scikit-Learn [87], Stanford dragon [26], Shape Net [19]. MNIST [73], SIFT [62], Glo Ve-25 and Glo Ve-100 [88] were taken from the ANN-benchmarks repository [4], while Hyper E-10 and Hyper E-50 are hyperbolic embeddings processed from Word Net datasets [95].
Dataset Splits No The paper uses standard datasets like MNIST and ShapeNet but does not explicitly state the training, validation, and test dataset splits (percentages, counts, or specific split methodologies) used for their experiments. While these datasets often have standard splits, the paper does not specify how they were utilized.
Hardware Specification Yes All benchmarks were performed on a workstation equipped with 8 Intel Xeon Gold 6142 CPU @ 2.60GHz cores (16 threads), 128Gb of RAM and a Nvidia RTX 2080 Ti GPU with 11Gb of device memory.
Software Dependencies No The paper mentions software like PyTorch, NumPy, Matlab, R, JAX/XLA, and scipy.sparse.linalg, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Table 1: Fitting a Gaussian Mixture Model: we perform 10 iterations of the standard EM algorithm with N points and K components in dimension D. Table 2: KNN search: average queries per second with a dataset of N points in dimension D. We work with batches of 10k queries at a time and K = 10 neighbors.