reproducibilityindex.ai

Linear-Complexity Data-Parallel Earth Mover’s Distance Approximations

Authors: Kubilay Atasu, Thomas Mittelholzer

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the popular text-based 20 Newsgroups dataset, the new algorithms are four orders of magnitude faster than a multi-threaded CPU implementation of Word Mover s Distance and match its nearest-neighbors-search accuracy. On MNIST images, the new algorithms are four orders of magnitude faster than a GPU implementation of the Sinkhorn s algorithm while offering a slightly higher nearest-neighbors-search accuracy.
Researcher Affiliation	Collaboration	1IBM Research Zurich 2HSR Hochschule f ur Technik, Rapperswil. Correspondence to: Kubilay Atasu <kat@zurich.ibm.com>.
Pseudocode	Yes	Algorithm 1 Optimal Computation of OMR; Algorithm 2 Optimal Computation of ICT; Algorithm 3 Approximate Computation of ICT
Open Source Code	No	No explicit statement about providing source code for their own methodology was found. The paper mentions using 'Cuturi s open-source implementation of Sinkhorn s algorithm'.
Open Datasets	Yes	We performed experiments on two public datasets: 20 Newsgroups is a text database of newsgroup documents, partitioned evenly across 20 different classes1, and MNIST is an image database of greyscale hand-written digits that are partitioned evenly across 10 classes2. 1http://qwone.com/ jason/20Newsgroups/; 2http://yann.lecun.com/exdb/mnist/
Dataset Splits	No	No specific dataset split information for training, validation, and test sets (e.g., exact percentages, sample counts, or citations to predefined splits) was provided. The paper mentions 'We used only the ﬁrst 6000 MNIST training images as our query documents and compared them with the complete set of 60000 MNIST training images,' which describes a query setup rather than a standard model training split.
Hardware Specification	Yes	We have developed GPU-accelerated implementations of the WCD, RWMD, OMR, ACT, and Bo W cosine similarity methods and evaluated their performance on an NVIDIA R GTX 1080Ti GPU. Cuturi s Sinkhorn implementation has also been executed on the same GPU. Our multithreaded WMD implementation has been deployed on an 8-core Intel R i7-6900K CPU.
Software Dependencies	No	No specific software dependencies with version numbers were provided. The paper mentions 'Fast EMD library' and 'Cuturi s open-source implementation of Sinkhorn s algorithm' but without version details.
Experiment Setup	Yes	We used λ = 20 as the entropic regularization parameter because it offered a good trade-off between the accuracy and the speed of Sinkhorn s algorithm. Word2Vec embedding vectors are L2-normalized, but MNIST embedding vectors are not normalized in our setup. In addition, when computing our approximations, the histogram weights are always L1-normalized. Lastly, the transportation cost between two words or pixels is the Euclidean (L2) distance between their embedding vectors.