Streaming Algorithms for Support-Aware Histograms
Authors: Justin Chen, Piotr Indyk, Tal Wagner
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the empirical side, we analyze the performance of our one-pass and two-pass streaming algorithms on several real-world datasets, demonstrating the practical application of support-aware histograms. |
| Researcher Affiliation | Collaboration | 1MIT, Cambridge, MA, USA 2Microsoft Research, Redmond, WA, USA. |
| Pseudocode | Yes | Algorithm 1 One-Pass Algorithm Algorithm 2 Two-Pass Algorithm |
| Open Source Code | No | The paper does not provide explicit statements or links to open-source code for the described methodology. |
| Open Datasets | Yes | The Taxi dataset (NYC Taxi and Limousine Commission, 2021) contains pickup times of yellow cabs in New York City in the month of January 2021. |
| Dataset Splits | No | The paper describes the datasets and experimental setup but does not specify the train/validation/test splits used for the experiments. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions specific algorithms and prior work (e.g., Space Saving algorithm, Jowhari et al. for sampling) but does not list specific software libraries or their version numbers used in the implementation. |
| Experiment Setup | Yes | For the two-pass algorithm, as all of the streams in our experiments are insertion-only, we use the space-saving based implementation of hierarchical heavy hitters (Mitzenmacher et al., 2012). In the first pass, the heaviness threshold is set to n lg(n)/s, and in the second pass, the L0 samples allowed by the budget s are evenly distributed over all light intervals. All of the baselines and algorithms involve storing sampled elements from the stream along with their masses. The parameter s is varied to compare how the number of samples affects the performance of the algorithms (as shown in the x-axis of the figures). The k parameter determines the number of pieces used by the baselines and by the one-pass algorithm (after removing heavy hitters). We set k = 5 in the experiments in this section and display the same figures with k = 10 in Appendix E. |