reproducibilityindex.ai

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Authors: Lin Chen, Hossein Esfandiari, Gang Fu, Vahab Mirrokni

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the ﬁrst part, we verify the theoretical bounds derived in Theorem 1 on real data. We used the latent Dirichlet allocation to extract the topic distributions of Reuters-21578, Distribution 1.0. The number of topics is set to 10. We sampled 100 documents uniformly at random and computed the GJS divergence and Hellinger distance between each pair of topic distributions. Each dot in Fig. 1 represents the topic distribution of a document. The horizontal axis denotes the Hellinger distance while the vertical axis denotes the GJS divergence. We chose diﬀerent parameter values (λ = 1/2, 1/3, 1/10) for the GJS divergence. From the three subﬁgures, we observe that both the upper and lower bounds are tight for the data. In the second part, we apply the proposed LSH scheme for the GJS divergence to the nearest neighbor search problem in Fashion MNIST [39], MNIST [23], and CIFAR-10 [21]. Each image in the datasets is ﬂattened into a vector and L1-normalized, thereby summing to 1. As described in Section 2.1, a concatenation of hash functions is used. We denote the number of concatenated hash functions by K and the number of compound hash functions by L. In the ﬁrst set of experiments, we set K = 3 and vary L from 20 to 40. We measure the execution time of LSH-based k-nearest neighbor search and the exact (brute-force) algorithm, where k is set to 20.
Researcher Affiliation	Collaboration	Lin Chen1,2 Hossein Esfandiari2 Thomas Fu2 Vahab S. Mirrokni2 1Yale University 2Google Research lin.chen@yale.edu, {esfandiari,thomasfu,mirrokni}@google.com
Pseudocode	Yes	Algorithm 1 Kre ın-LSH Input: Discretization parameters J N and > 0. Output: The left and right Kre ın transform η1 and η2.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described, nor does it state that the code is open-source or provide a link.
Open Datasets	Yes	We used the latent Dirichlet allocation to extract the topic distributions of Reuters-21578, Distribution 1.0. ... we apply the proposed LSH scheme for the GJS divergence to the nearest neighbor search problem in Fashion MNIST [39], MNIST [23], and CIFAR-10 [21].
Dataset Splits	No	The paper mentions using datasets for experiments (Reuters-21578, Fashion MNIST, MNIST, CIFAR-10) but does not provide specific details on training, validation, or test data splits (e.g., percentages, sample counts, or explicit splitting methodologies).
Hardware Specification	Yes	Both algorithms were run on a 2.2 GHz Intel Core i7 processor.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	In the ﬁrst set of experiments, we set K = 3 and vary L from 20 to 40. We measure the execution time of LSH-based k-nearest neighbor search and the exact (brute-force) algorithm, where k is set to 20. ... We also vary the parameter of the GJS divergence and choose λ from {1/2, 1/3, 1/10}. ... In the second set of experiments, we ﬁx the parameter of the GJS divergence to 1/2; i.e., the JS divergence is used. The number of concatenated hash functions K ranges from 3 to 5 or 4 to 6.