reproducibilityindex.ai

Fast Partitioned Learned Bloom Filter

Authors: Atsuki Sato, Yusuke Matsui

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results from real-world datasets show that (i) fast PLBF and fast PLBF++ can construct the data structure up to 233 and 761 times faster than PLBF, (ii) fast PLBF can achieve the same memory efficiency as PLBF, and (iii) fast PLBF++ can achieve almost the same memory efficiency as PLBF.
Researcher Affiliation	Academia	Atsuki Sato Yusuke Matsui The University of Tokyo Tokyo, Japan a_sato@hal.t.u-tokyo.ac.jp matsui@hal.t.u-tokyo.ac.jp
Pseudocode	Yes	The pseudo-code for PLBF construction is provided in the appendix. (Referring to Algorithm 1, 2, 3, 4, 5 in the appendix)
Open Source Code	Yes	The codes are available at https://github.com/atsukisato/Fast PLBF.
Open Datasets	Yes	Malicious URLs Dataset: As in previous papers [11, 14], we used Malicious URLs Dataset [17]. ...[17] Manu Siddhartha. Malicious urls dataset \| kaggle. URL https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset [Online; accessed 22-December-2022], 2021. EMBER Dataset: We used the EMBER dataset [18] as in the PLBF research. ...[18] Hyrum S Anderson and Phil Roth. Ember: an open dataset for training static pe malware machine learning models. ar Xiv preprint ar Xiv:1804.04637, 2018.
Dataset Splits	No	We used all malicious URLs and 342,482 (80%) benign URLs as the training set, and the remaining benign URLs as the test set. ... We used all malicious files and 300,000 (75%) benign files as the train set and the remaining benign files as the test set. The paper specifies training and test sets but does not explicitly mention a separate validation set split.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (CPU, GPU models, memory, etc.) used for running the experiments.
Software Dependencies	No	While any model can be used for the classifier, we used Light GBM [19] because of its speed in training and inference, as well as its memory efficiency and accuracy. The paper mentions Light GBM but does not provide a specific version number for it or any other software dependency.
Experiment Setup	Yes	Following the experiments in the PLBF paper, hyperparameters for PLBF, fast PLBF, and fast PLBF++ were set to N = 1, 000 and k = 5. ... The memory size is specified by the user, and N and k are hyperparameters that are determined by balancing construction time and accuracy.