WOR and $p$'s: Sketches for $\ell_p$-Sampling Without Replacement
Authors: Edith Cohen, Rasmus Pagh, David Woodruff
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our design is simple and practical, despite intricate analysis, and based on off-the-shelf use of widely implemented heavy hitters sketches such as Count Sketch. Our method is the first to provide WOR sampling in the important regime of p > 1 and the first to handle signed updates for p > 0. ... As a bonus, we include practical optimizations (that preserve the theoretical guarantees) and perform experiments that demonstrate both the practicality and accuracy of WORp. |
| Researcher Affiliation | Collaboration | Edith Cohen Google Research Tel Aviv University edith@cohenwang.com Rasmus Pagh IT University of Copenhagen BARC Google Research pagh@itu.dk David P. Woodruff CMU dwoodruf@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1: WORp (high level) |
| Open Source Code | Yes | Code for the experiments is provided in the following Colab notebook https://colab.research. google.com/drive/1Tix7Sws Pp7A_Ot Sua Rf3Iwf TH-qo9_81?usp=sharing |
| Open Datasets | No | The paper uses Zipfian distributions for its experiments, which are standard synthetic data models. While reproducible by definition, the paper does not provide a specific URL, DOI, repository, or formal citation to a pre-existing, externally hosted 'dataset' instance of these distributions, which is required for 'concrete access information'. |
| Dataset Splits | No | The paper does not specify training, validation, and test dataset splits. The experiments involve simulations and repetitions on data generated from Zipfian distributions rather than fixed dataset splits for model training. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or other computing specifications. |
| Software Dependencies | No | The paper mentions 'Python' and 'Count Sketch' as software components used in the experiments but does not provide specific version numbers for these, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We simulated 2-pass and 1-pass WORp in Python using Count Sketch with 15 repetitions and table size 2k (total space 30k) as our r HH sketch. |