reproducibilityindex.ai

Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements

Authors: Ankush Mandal, He Jiang, Anshumali Shrivastava, Vivek Sarkar

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the new proposal is on average 2.5x faster in practice than FA for distributed and multi-threaded execution.
Researcher Affiliation	Academia	Ankush Mandal School of Computer Science Georgia Institute of Technology Atlanta, GA ankush@gatech.edu; He Jiang Department of Computer Science Rice University Houston, TX cary.jiang@rice.edu; Anshumali Shrivastava Department of Computer Science Rice University Houston, TX anshumali@rice.edu; Vivek Sarkar School of Computer Science Georgia Institute of Technology Atlanta, GA vsarkar@gatech.edu
Pseudocode	Yes	Algorithm 1: Topkapi; Algorithm 2: Topkapi_Parallel(S[][], K, N, T)
Open Source Code	Yes	https://github.com/ankushmandal/topkapi.git
Open Datasets	Yes	As for data we have used text data from the Project Gutenberg [1] corpus and PUMA Datasets [9].
Dataset Splits	No	The paper uses "16GB data" and "128GB data" from public sources and discusses "strong scaling" and "weak scaling," but it does not specify train/validation/test splits for these datasets.
Hardware Specification	Yes	Used a cluster of Intel R Westmere processors with each node having 12 cores. Used a single node with 32 cores from four IBM Power R 7 chip. Used a cluster of IBM Power R 7 processors where each node has 32 cores from four processor chips.
Software Dependencies	No	The paper states implementations are in C++ and uses MPI for communication, but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For all the experiments, K is set to 100 unless otherwise stated. We used 512 and 2048 buckets or counters respectively for K=50 and K=200. For example, with l = 4 and b = 1024, the size of the count array is 16KB and the size of the id array is 64KB. Number of threads per node is 8. Used a cluster of Intel R Westmere processors with each node having 12 cores.