Optimal Algorithms for Augmented Testing of Discrete Distributions
Authors: Maryam Aliakbarpour, Piotr Indyk, Ronitt Rubinfeld, Sandeep Silwal
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, experimental results show that the performance of our algorithms on real data significantly exceeds our worst-case guarantees for sample complexity, demonstrating the practicality of our approach. We empirically evaluate our augmented closeness testing algorithm on synthetic and real distributions and refer all details to Section E. |
| Researcher Affiliation | Academia | Maryam Aliakbarpour Rice University Houston, TX maryama@rice.edu Piotr Indyk MIT Cambridge, MA indyk@mit.edu Ronitt Rubinfeld MIT Cambridge, MA ronitt@csail.mit.edu Sandeep Silwal UW-Madison Madison, WI silwal@cs.wisc.edu |
| Pseudocode | Yes | Algorithm 1 An augmented tester for testing closeness of p and q with a suggested value α |
| Open Source Code | Yes | We include a link to the CAIDA dataset and a Jupyter notebook is attached to the submission. |
| Open Datasets | Yes | The data is internet traffic data collected at a backbone link of a Tier1 ISP in a data center9. Using the preprocessing in [1], we construct the empirical distribution over source IP addresses in consecutive chunks of time. ... 9From CAIDA internet traces 2019, see https://www.caida.org/catalog/datasets/monitors/ passive-equinix-nyc |
| Dataset Splits | No | The paper describes using 'synthetic and real distributions' and sets 'p' and 'q' based on time chunks of the IP traffic data, but it does not specify explicit training, validation, and test splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not specify any particular hardware, such as GPU or CPU models, or cloud computing instances, used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions) in its main text or provided excerpt. |
| Experiment Setup | Yes | We set n, the domain size, to n = 10^6 and our goal is to determine if we are sampling from (p, p) or (p, q). ... We set p to be distribution for the first chunk of time and we let q be the distribution for the last chunk of time. ... In our experiments, given a sample budget, we compute the empirical distribution of Z for both algorithms under both hypothesis cases across 100 trials. |