WOR and $p$'s: Sketches for $\ell_p$-Sampling Without Replacement

Authors: Edith Cohen, Rasmus Pagh, David Woodruff

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our design is simple and practical, despite intricate analysis, and based on off-the-shelf use of widely implemented heavy hitters sketches such as Count Sketch. Our method is the first to provide WOR sampling in the important regime of p > 1 and the first to handle signed updates for p > 0. ... As a bonus, we include practical optimizations (that preserve the theoretical guarantees) and perform experiments that demonstrate both the practicality and accuracy of WORp.
Researcher Affiliation Collaboration Edith Cohen Google Research Tel Aviv University edith@cohenwang.com Rasmus Pagh IT University of Copenhagen BARC Google Research pagh@itu.dk David P. Woodruff CMU dwoodruf@cs.cmu.edu
Pseudocode Yes Algorithm 1: WORp (high level)
Open Source Code Yes Code for the experiments is provided in the following Colab notebook https://colab.research. google.com/drive/1Tix7Sws Pp7A_Ot Sua Rf3Iwf TH-qo9_81?usp=sharing
Open Datasets No The paper uses Zipfian distributions for its experiments, which are standard synthetic data models. While reproducible by definition, the paper does not provide a specific URL, DOI, repository, or formal citation to a pre-existing, externally hosted 'dataset' instance of these distributions, which is required for 'concrete access information'.
Dataset Splits No The paper does not specify training, validation, and test dataset splits. The experiments involve simulations and repetitions on data generated from Zipfian distributions rather than fixed dataset splits for model training.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or other computing specifications.
Software Dependencies No The paper mentions 'Python' and 'Count Sketch' as software components used in the experiments but does not provide specific version numbers for these, which is required for a reproducible description of ancillary software.
Experiment Setup Yes We simulated 2-pass and 1-pass WORp in Python using Count Sketch with 15 repetitions and table size 2k (total space 30k) as our r HH sketch.