reproducibilityindex.ai

Optimal Algorithms for Augmented Testing of Discrete Distributions

Authors: Maryam Aliakbarpour, Piotr Indyk, Ronitt Rubinfeld, Sandeep Silwal

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, experimental results show that the performance of our algorithms on real data significantly exceeds our worst-case guarantees for sample complexity, demonstrating the practicality of our approach. We empirically evaluate our augmented closeness testing algorithm on synthetic and real distributions and refer all details to Section E.
Researcher Affiliation	Academia	Maryam Aliakbarpour Rice University Houston, TX maryama@rice.edu Piotr Indyk MIT Cambridge, MA indyk@mit.edu Ronitt Rubinfeld MIT Cambridge, MA ronitt@csail.mit.edu Sandeep Silwal UW-Madison Madison, WI silwal@cs.wisc.edu
Pseudocode	Yes	Algorithm 1 An augmented tester for testing closeness of p and q with a suggested value α
Open Source Code	Yes	We include a link to the CAIDA dataset and a Jupyter notebook is attached to the submission.
Open Datasets	Yes	The data is internet traffic data collected at a backbone link of a Tier1 ISP in a data center9. Using the preprocessing in [1], we construct the empirical distribution over source IP addresses in consecutive chunks of time. ... 9From CAIDA internet traces 2019, see https://www.caida.org/catalog/datasets/monitors/ passive-equinix-nyc
Dataset Splits	No	The paper describes using 'synthetic and real distributions' and sets 'p' and 'q' based on time chunks of the IP traffic data, but it does not specify explicit training, validation, and test splits with percentages or sample counts.
Hardware Specification	No	The paper does not specify any particular hardware, such as GPU or CPU models, or cloud computing instances, used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions) in its main text or provided excerpt.
Experiment Setup	Yes	We set n, the domain size, to n = 10^6 and our goal is to determine if we are sampling from (p, p) or (p, q). ... We set p to be distribution for the first chunk of time and we let q be the distribution for the last chunk of time. ... In our experiments, given a sample budget, we compute the empirical distribution of Z for both algorithms under both hypothesis cases across 100 trials.