Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Frequency Estimation Algorithms with and without Predictions

Authors: Anders Aamand, Justin Chen, Huy Nguyen, Sandeep Silwal, Ali Vakilian

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches. ... We experimentally evaluate our algorithms with and without predictions on real and synthetic datasets and demonstrate that the improvements predicted by theory hold in practice.
Researcher Affiliation Academia Anders Aamand MIT EMAIL Justin Y. Chen MIT EMAIL Huy Lê Nguyê n Northeastern University EMAIL Sandeep Silwal MIT EMAIL Ali Vakilian TTIC EMAIL
Pseudocode Yes Algorithm 1 (Not augmented) Frequency update algorithm; Algorithm 2 (Not augmented) Frequency estimation algorithm; Algorithm 3 (Learning-augmented) Frequency update algorithm; Algorithm 4 (Learning-augmented) Frequency estimation algorithm
Open Source Code No The paper mentions open-source platforms like Spark and Twitter Algebird, but does not provide a statement or link for the open-source code of its own described methodology.
Open Datasets Yes We use the same two real-world datasets and predictions from [36]: the CAIDA and AOL datasets. The CAIDA dataset [12] contains 50 minutes of internet traffic data. ... The AOL dataset [55] contains 80 days of internet search queries.
Dataset Splits No The paper describes the datasets used but does not explicitly provide details about training, validation, or test dataset splits for its own experiments.
Hardware Specification No The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions general software like Spark and Algebird in the context of existing implementations but does not list specific software dependencies with version numbers for its own experimental setup.
Experiment Setup Yes For all implementations, we use three rows in the CS table and vary the number of columns. ... If the median estimate of an element is below a threshold of Cn/w for domain size n, sketch width w (a third of the total space), and a tunable constant C, the estimate is instead set to 0.