Scalable Nearest Neighbor Search for Optimal Transport

Authors: Arturs Backurs, Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experimental evaluation of nearest neighbor search algorithms in the W1 distance on realworld dataset. Our results show that compared to previous state of the art, Flowtree achieves up to 7.4 times faster running time. ... In this section we empirically evaluate Flowtree and compare it to various existing methods.
Researcher Affiliation Collaboration 1Author names are ordered alphabetically. Code available at https://github.com/ilyaraz/ot_estimators. 2TTIC 3Microsoft 4MIT 5Microsoft Research.
Pseudocode No The paper describes algorithms in prose but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code available at https://github.com/ilyaraz/ot_estimators.
Open Datasets Yes We use the standard benchmark 20news dataset of news-related online discussion groups, and a dataset of Amazon reviews... We use the MNIST dataset of handwritten digits.
Dataset Splits No The paper mentions tuning parameters on 'a random subset of 300 queries' but does not provide explicit training, validation, and test dataset splits with percentages or sample counts for model training.
Hardware Specification Yes All running times are measured on a Standard F72s v2 Microsoft Azure instance equipped with Intel Xeon Platinum 8168 CPU.
Software Dependencies No The paper mentions software like Num Py, Open BLAS, Python, C++, Eigen, POT, and Lemon graph library but does not provide specific version numbers for any of them.
Experiment Setup No A pipeline with ℓalgorithms has parameters c1, . . . , cℓ 1, where ci is the number of output candidates (non-pruned points) of the ith algorithm in the pipeline. ... The optimal parameters are listed in the appendix.