Improved Frequency Estimation Algorithms with and without Predictions

Authors: Anders Aamand, Justin Chen, Huy Nguyen, Sandeep Silwal, Ali Vakilian

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches. ... We experimentally evaluate our algorithms with and without predictions on real and synthetic datasets and demonstrate that the improvements predicted by theory hold in practice.
Researcher Affiliation Academia Anders Aamand MIT aamand@mit.edu Justin Y. Chen MIT justc@mit.edu Huy Lê Nguyê n Northeastern University hu.nguyen@northeastern.edu Sandeep Silwal MIT silwal@mit.edu Ali Vakilian TTIC vakilian@ttic.edu
Pseudocode Yes Algorithm 1 (Not augmented) Frequency update algorithm; Algorithm 2 (Not augmented) Frequency estimation algorithm; Algorithm 3 (Learning-augmented) Frequency update algorithm; Algorithm 4 (Learning-augmented) Frequency estimation algorithm
Open Source Code No The paper mentions open-source platforms like Spark and Twitter Algebird, but does not provide a statement or link for the open-source code of its own described methodology.
Open Datasets Yes We use the same two real-world datasets and predictions from [36]: the CAIDA and AOL datasets. The CAIDA dataset [12] contains 50 minutes of internet traffic data. ... The AOL dataset [55] contains 80 days of internet search queries.
Dataset Splits No The paper describes the datasets used but does not explicitly provide details about training, validation, or test dataset splits for its own experiments.
Hardware Specification No The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions general software like Spark and Algebird in the context of existing implementations but does not list specific software dependencies with version numbers for its own experimental setup.
Experiment Setup Yes For all implementations, we use three rows in the CS table and vary the number of columns. ... If the median estimate of an element is below a threshold of Cn/w for domain size n, sketch width w (a third of the total space), and a tunable constant C, the estimate is instead set to 0.