Meta-Learning Neural Bloom Filters

Authors: Jack Rae, Sergey Bartunov, Timothy Lillicrap

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments explore scenarios where set membership can be learned in one-shot with improved compression over the classical Bloom Filter. We compared the Neural Bloom Filter with three memory-augmented neural networks, the LSTM, DNC, and Memory Network, that are all able to write storage sets in one-shot. We compared the space (in bits) of the model s memory (or state) to a Bloom Filter at a given false positive rate and 0% false negative rate. The false positive rate is measured empirically over a sample of 50, 000 queries for the learned models.
Researcher Affiliation Collaboration Jack W Rae 1 2 Sergey Bartunov 1 Timothy P Lillicrap 1 2 1Deep Mind, London, UK 2Co MPLEX, Computer Science, University College London, London, UK. Correspondence to: Jack W Rae <jwrae@google.com>.
Pseudocode Yes Algorithm 1 Neural Bloom Filter; Algorithm 2 Meta-Learning Training
Open Source Code No No explicit statement found about releasing the source code for the work described in this paper, nor a direct link to a source-code repository.
Open Datasets Yes Sampling Strategies on MNIST (Section 5.2); We chose the 2.5M unique tokens in the Giga Word v5 news corpus to be our universe (Section 5.4).
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) is provided for validation data. The paper describes a meta-learning training scheme which involves sampling tasks and sets, but does not detail a fixed train/validation/test split for a single dataset.
Hardware Specification Yes We benchmark the models on the CPU (Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz) and on the GPU (NVIDIA Quadro P6000) with models implemented in Tensor Flow without any model-specific optimizations.
Software Dependencies No The paper mentions "Tensor Flow" but does not specify a version number or other versioned libraries, which is required for reproducibility.
Experiment Setup Yes To give an example network configuration, we chose fenc to be a 3-layer CNN in the case of image inputs, and a 128-hidden-unit LSTM in the case of text inputs. We chose fw and fq to be an MLP with a single hidden layer of size 128, followed by layer normalization, and fout to be a 3-layer MLP with residual connections. We used a leaky Re LU as the non-linearity. For each model we sweep over hyper-parameters relating to model size to obtain their smallest operating size at the desired false positive rate (for the full set, see Appendix D).