reproducibilityindex.ai

Data Structures for Density Estimation

Authors: Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam Narayanan, Sandeep Silwal

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the empirical side, we experimentally evaluate the linear-time algorithm and compare its performance to the (Acharya et al., 2018) algorithm on synthetic and on real networking data. These experiments display the practical benefits of the faster algorithm. For example, for synthetic data, we demonstrate that our faster algorithm achieves over 2x reduction in the number of comparisons needed to achieve the same level of accuracy. Similarly, our experiments on network data show up to 5x reduction in the number of comparisons.
Researcher Affiliation	Academia	1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. 2Data Science Institute, Columbia University, New York, NY 10027, USA.
Pseudocode	Yes	Algorithm 1 Heavy light decomposition; Algorithm 2 Preprocessing; Algorithm 3 Sublinear Time Hypothesis Selection
Open Source Code	Yes	3Code available at https://github.com/justc2/datastructdensityest
Open Datasets	Yes	The underlying network data we use comes from the CAIDA Anonymized Internet Trace internet traffic dataset4, which is IP traffic data collected at a backbone link of a Tier1 ISP in a data center in NYC. ... 4From CAIDA internet traces 2019, https: //www.caida.org/catalog/datasets/monitors/ passive-equinix-nyc/
Dataset Splits	No	The paper describes its experimental setup including sample sizes and query formation (e.g., 'For a series of 100 query chunks, we use the prior 2,048 chunks as the set of distributions to search over'), but does not explicitly mention or specify training, validation, or testing splits in the conventional machine learning sense for model development.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., programming language versions, library versions, or specific solver versions) used in the experiments.
Experiment Setup	Yes	The algorithms have several key parameters which we vary throughout the experiments. n All Pairs is the number of distributions in each level of the tournament which are randomly sampled to compete in an all-pairs tournament at the end of the algorithm. This parameter is used in both the base tournament and our fast tournament. fast Const controls the number of samples used in each level of the fast tournament. In particular, at the ith level, the fast tournament uses fast Const i samples for each Scheffe test.