Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improved Frequency Estimation Algorithms with and without Predictions
Authors: Anders Aamand, Justin Chen, Huy Nguyen, Sandeep Silwal, Ali Vakilian
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches. ... We experimentally evaluate our algorithms with and without predictions on real and synthetic datasets and demonstrate that the improvements predicted by theory hold in practice. |
| Researcher Affiliation | Academia | Anders Aamand MIT EMAIL Justin Y. Chen MIT EMAIL Huy Lê Nguyê n Northeastern University EMAIL Sandeep Silwal MIT EMAIL Ali Vakilian TTIC EMAIL |
| Pseudocode | Yes | Algorithm 1 (Not augmented) Frequency update algorithm; Algorithm 2 (Not augmented) Frequency estimation algorithm; Algorithm 3 (Learning-augmented) Frequency update algorithm; Algorithm 4 (Learning-augmented) Frequency estimation algorithm |
| Open Source Code | No | The paper mentions open-source platforms like Spark and Twitter Algebird, but does not provide a statement or link for the open-source code of its own described methodology. |
| Open Datasets | Yes | We use the same two real-world datasets and predictions from [36]: the CAIDA and AOL datasets. The CAIDA dataset [12] contains 50 minutes of internet traffic data. ... The AOL dataset [55] contains 80 days of internet search queries. |
| Dataset Splits | No | The paper describes the datasets used but does not explicitly provide details about training, validation, or test dataset splits for its own experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions general software like Spark and Algebird in the context of existing implementations but does not list specific software dependencies with version numbers for its own experimental setup. |
| Experiment Setup | Yes | For all implementations, we use three rows in the CS table and vary the number of columns. ... If the median estimate of an element is below a threshold of Cn/w for domain size n, sketch width w (a third of the total space), and a tunable constant C, the estimate is instead set to 0. |