Fast Partitioned Learned Bloom Filter

Authors: Atsuki Sato, Yusuke Matsui

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results from real-world datasets show that (i) fast PLBF and fast PLBF++ can construct the data structure up to 233 and 761 times faster than PLBF, (ii) fast PLBF can achieve the same memory efficiency as PLBF, and (iii) fast PLBF++ can achieve almost the same memory efficiency as PLBF.
Researcher Affiliation Academia Atsuki Sato Yusuke Matsui The University of Tokyo Tokyo, Japan a_sato@hal.t.u-tokyo.ac.jp matsui@hal.t.u-tokyo.ac.jp
Pseudocode Yes The pseudo-code for PLBF construction is provided in the appendix. (Referring to Algorithm 1, 2, 3, 4, 5 in the appendix)
Open Source Code Yes The codes are available at https://github.com/atsukisato/Fast PLBF.
Open Datasets Yes Malicious URLs Dataset: As in previous papers [11, 14], we used Malicious URLs Dataset [17]. ...[17] Manu Siddhartha. Malicious urls dataset | kaggle. URL https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset [Online; accessed 22-December-2022], 2021. EMBER Dataset: We used the EMBER dataset [18] as in the PLBF research. ...[18] Hyrum S Anderson and Phil Roth. Ember: an open dataset for training static pe malware machine learning models. ar Xiv preprint ar Xiv:1804.04637, 2018.
Dataset Splits No We used all malicious URLs and 342,482 (80%) benign URLs as the training set, and the remaining benign URLs as the test set. ... We used all malicious files and 300,000 (75%) benign files as the train set and the remaining benign files as the test set. The paper specifies training and test sets but does not explicitly mention a separate validation set split.
Hardware Specification No The paper does not explicitly describe the specific hardware (CPU, GPU models, memory, etc.) used for running the experiments.
Software Dependencies No While any model can be used for the classifier, we used Light GBM [19] because of its speed in training and inference, as well as its memory efficiency and accuracy. The paper mentions Light GBM but does not provide a specific version number for it or any other software dependency.
Experiment Setup Yes Following the experiments in the PLBF paper, hyperparameters for PLBF, fast PLBF, and fast PLBF++ were set to N = 1, 000 and k = 5. ... The memory size is specified by the user, and N and k are hyperparameters that are determined by balancing construction time and accuracy.