On Memorization in Probabilistic Deep Generative Models

Authors: Gerrit van den Burg, Chris Williams

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Next, we present a study that demonstrates how memorization can occur in probabilistic deep generative models such as variational autoencoders. This reveals that the form of memorization to which these models are susceptible differs fundamentally from mode collapse and overfitting. Furthermore, we show that the proposed memorization score measures a phenomenon that is not captured by commonly-used nearest neighbor tests. Finally, we discuss several strategies that can be used to limit memorization in practice. Our work thus provides a framework for understanding problematic memorization in probabilistic generative models.
Researcher Affiliation Academia Gerrit J.J. van den Burg gertjanvandenburg@gmail.com Christopher K.I. Williams University of Edinburgh The Alan Turing Institute ckiw@inf.ed.ac.uk
Pseudocode Yes Algorithm 1 Computing the Cross-Validated Memorization Score
Open Source Code Yes Code to reproduce our experiments can be found in an online repository.2 See: https://github.com/alan-turing-institute/memorization.
Open Datasets Yes We use importance sampling on the decoder [47] to approximate log pθ(xi) for the computation of the memorization score, and focus on the MNIST [48], CIFAR-10 [49], and Celeb A [50] data sets.
Dataset Splits Yes Instead of using a leave-one-out method or random sampling, we use a K-fold approach as is done in cross-validation. Let Ik denote randomly sampled disjoint subsets of the indices [n] = {1, . . . , n} of size n/K, such that K k=1Ik = [n]. We then train the model on each of the training sets D[n]\Ik and compute the log probability for all observations in the training set and the holdout set DIk. ... The memorization score is estimated using L = 10 repetitions and K = 10 folds.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper states, 'For the optimization we use Adam [51] and we implement all models in Py Torch [52].' While PyTorch is mentioned, a specific version number is not provided, which is required for reproducibility.
Experiment Setup Yes The memorization score is estimated using L = 10 repetitions and K = 10 folds. ... With a learning rate of η = 10 3 (blue curves), a clear generalization gap can be seen in the loss curves... This generalization gap disappears when training with the smaller learning rate of η = 10 4 (yellow curves).