reproducibilityindex.ai

Streaming Algorithms for Support-Aware Histograms

Authors: Justin Chen, Piotr Indyk, Tal Wagner

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the empirical side, we analyze the performance of our one-pass and two-pass streaming algorithms on several real-world datasets, demonstrating the practical application of support-aware histograms.
Researcher Affiliation	Collaboration	1MIT, Cambridge, MA, USA 2Microsoft Research, Redmond, WA, USA.
Pseudocode	Yes	Algorithm 1 One-Pass Algorithm Algorithm 2 Two-Pass Algorithm
Open Source Code	No	The paper does not provide explicit statements or links to open-source code for the described methodology.
Open Datasets	Yes	The Taxi dataset (NYC Taxi and Limousine Commission, 2021) contains pickup times of yellow cabs in New York City in the month of January 2021.
Dataset Splits	No	The paper describes the datasets and experimental setup but does not specify the train/validation/test splits used for the experiments.
Hardware Specification	No	The paper does not provide specific details on the hardware used for running the experiments.
Software Dependencies	No	The paper mentions specific algorithms and prior work (e.g., Space Saving algorithm, Jowhari et al. for sampling) but does not list specific software libraries or their version numbers used in the implementation.
Experiment Setup	Yes	For the two-pass algorithm, as all of the streams in our experiments are insertion-only, we use the space-saving based implementation of hierarchical heavy hitters (Mitzenmacher et al., 2012). In the ﬁrst pass, the heaviness threshold is set to n lg(n)/s, and in the second pass, the L0 samples allowed by the budget s are evenly distributed over all light intervals. All of the baselines and algorithms involve storing sampled elements from the stream along with their masses. The parameter s is varied to compare how the number of samples affects the performance of the algorithms (as shown in the x-axis of the ﬁgures). The k parameter determines the number of pieces used by the baselines and by the one-pass algorithm (after removing heavy hitters). We set k = 5 in the experiments in this section and display the same ﬁgures with k = 10 in Appendix E.