Improved Frequency Estimation Algorithms with and without Predictions
Authors: Anders Aamand, Justin Chen, Huy Nguyen, Sandeep Silwal, Ali Vakilian
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches. ... We experimentally evaluate our algorithms with and without predictions on real and synthetic datasets and demonstrate that the improvements predicted by theory hold in practice. |
| Researcher Affiliation | Academia | Anders Aamand MIT aamand@mit.edu Justin Y. Chen MIT justc@mit.edu Huy Lê Nguyê n Northeastern University hu.nguyen@northeastern.edu Sandeep Silwal MIT silwal@mit.edu Ali Vakilian TTIC vakilian@ttic.edu |
| Pseudocode | Yes | Algorithm 1 (Not augmented) Frequency update algorithm; Algorithm 2 (Not augmented) Frequency estimation algorithm; Algorithm 3 (Learning-augmented) Frequency update algorithm; Algorithm 4 (Learning-augmented) Frequency estimation algorithm |
| Open Source Code | No | The paper mentions open-source platforms like Spark and Twitter Algebird, but does not provide a statement or link for the open-source code of its own described methodology. |
| Open Datasets | Yes | We use the same two real-world datasets and predictions from [36]: the CAIDA and AOL datasets. The CAIDA dataset [12] contains 50 minutes of internet traffic data. ... The AOL dataset [55] contains 80 days of internet search queries. |
| Dataset Splits | No | The paper describes the datasets used but does not explicitly provide details about training, validation, or test dataset splits for its own experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions general software like Spark and Algebird in the context of existing implementations but does not list specific software dependencies with version numbers for its own experimental setup. |
| Experiment Setup | Yes | For all implementations, we use three rows in the CS table and vary the number of columns. ... If the median estimate of an element is below a threshold of Cn/w for domain size n, sketch width w (a third of the total space), and a tunable constant C, the estimate is instead set to 0. |