reproducibilityindex.ai

Local Density Estimation in High Dimensions

Authors: Xian Wu, Moses Charikar, Vishnu Natchu

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our algorithm uses locality sensitive hashing to preprocess the data to accurately and efﬁciently estimate the answers to such questions via an unbiased estimator that uses importance sampling. [...] We demonstrate the effectiveness of our algorithm by experiments on a standard word embedding dataset.
Researcher Affiliation	Collaboration	1Stanford University, USA 2Laserlike Inc, USA.
Pseudocode	Yes	Theorem 3.1 (Aggregate-Counts). Given a set of K hash tables, each with 2t hash buckets with addresses in {0, 1}t, Aggregate-Counts (Algorithm 1) computes...; Theorem 3.2 (Sampler). ...Hamming-Distance-Sampler (Algorithm 2) generates a sample in time O(t).
Open Source Code	No	The paper does not provide any specific links to source code, nor does it state that the code for their methodology is publicly released or available in supplementary materials.
Open Datasets	Yes	We use the set of 400,000 pre-trained 50-dimensional word embedding vectors trained from Wikipedia 2014 + Gigaword 5, provided by (Pennington et al., 2014).
Dataset Splits	No	The paper uses a pre-trained dataset (GLOVE embeddings) and evaluates an estimator on it. It does not describe standard train/validation/test splits for training a model, nor does it provide specific percentages or counts for data partitioning relevant to typical model validation.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, or specific libraries with their versions) that would be needed to reproduce the experiment setup.
Experiment Setup	Yes	We also ﬁx t = 20 in all of our experiments, since we have 400,000 embeddings in total and 20 log2(400, 000). [...] In this experiment, we ﬁx our sampling budget to 1000 samples and the table budget to 20 tables.