Fast Partitioned Learned Bloom Filter
Authors: Atsuki Sato, Yusuke Matsui
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results from real-world datasets show that (i) fast PLBF and fast PLBF++ can construct the data structure up to 233 and 761 times faster than PLBF, (ii) fast PLBF can achieve the same memory efficiency as PLBF, and (iii) fast PLBF++ can achieve almost the same memory efficiency as PLBF. |
| Researcher Affiliation | Academia | Atsuki Sato Yusuke Matsui The University of Tokyo Tokyo, Japan a_sato@hal.t.u-tokyo.ac.jp matsui@hal.t.u-tokyo.ac.jp |
| Pseudocode | Yes | The pseudo-code for PLBF construction is provided in the appendix. (Referring to Algorithm 1, 2, 3, 4, 5 in the appendix) |
| Open Source Code | Yes | The codes are available at https://github.com/atsukisato/Fast PLBF. |
| Open Datasets | Yes | Malicious URLs Dataset: As in previous papers [11, 14], we used Malicious URLs Dataset [17]. ...[17] Manu Siddhartha. Malicious urls dataset | kaggle. URL https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset [Online; accessed 22-December-2022], 2021. EMBER Dataset: We used the EMBER dataset [18] as in the PLBF research. ...[18] Hyrum S Anderson and Phil Roth. Ember: an open dataset for training static pe malware machine learning models. ar Xiv preprint ar Xiv:1804.04637, 2018. |
| Dataset Splits | No | We used all malicious URLs and 342,482 (80%) benign URLs as the training set, and the remaining benign URLs as the test set. ... We used all malicious files and 300,000 (75%) benign files as the train set and the remaining benign files as the test set. The paper specifies training and test sets but does not explicitly mention a separate validation set split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (CPU, GPU models, memory, etc.) used for running the experiments. |
| Software Dependencies | No | While any model can be used for the classifier, we used Light GBM [19] because of its speed in training and inference, as well as its memory efficiency and accuracy. The paper mentions Light GBM but does not provide a specific version number for it or any other software dependency. |
| Experiment Setup | Yes | Following the experiments in the PLBF paper, hyperparameters for PLBF, fast PLBF, and fast PLBF++ were set to N = 1, 000 and k = 5. ... The memory size is specified by the user, and N and k are hyperparameters that are determined by balancing construction time and accuracy. |