reproducibilityindex.ai

On Memorization in Probabilistic Deep Generative Models

Authors: Gerrit van den Burg, Chris Williams

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Next, we present a study that demonstrates how memorization can occur in probabilistic deep generative models such as variational autoencoders. This reveals that the form of memorization to which these models are susceptible differs fundamentally from mode collapse and overﬁtting. Furthermore, we show that the proposed memorization score measures a phenomenon that is not captured by commonly-used nearest neighbor tests. Finally, we discuss several strategies that can be used to limit memorization in practice. Our work thus provides a framework for understanding problematic memorization in probabilistic generative models.
Researcher Affiliation	Academia	Gerrit J.J. van den Burg gertjanvandenburg@gmail.com Christopher K.I. Williams University of Edinburgh The Alan Turing Institute ckiw@inf.ed.ac.uk
Pseudocode	Yes	Algorithm 1 Computing the Cross-Validated Memorization Score
Open Source Code	Yes	Code to reproduce our experiments can be found in an online repository.2 See: https://github.com/alan-turing-institute/memorization.
Open Datasets	Yes	We use importance sampling on the decoder [47] to approximate log pθ(xi) for the computation of the memorization score, and focus on the MNIST [48], CIFAR-10 [49], and Celeb A [50] data sets.
Dataset Splits	Yes	Instead of using a leave-one-out method or random sampling, we use a K-fold approach as is done in cross-validation. Let Ik denote randomly sampled disjoint subsets of the indices [n] = {1, . . . , n} of size n/K, such that K k=1Ik = [n]. We then train the model on each of the training sets D[n]\Ik and compute the log probability for all observations in the training set and the holdout set DIk. ... The memorization score is estimated using L = 10 repetitions and K = 10 folds.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper states, 'For the optimization we use Adam [51] and we implement all models in Py Torch [52].' While PyTorch is mentioned, a specific version number is not provided, which is required for reproducibility.
Experiment Setup	Yes	The memorization score is estimated using L = 10 repetitions and K = 10 folds. ... With a learning rate of η = 10 3 (blue curves), a clear generalization gap can be seen in the loss curves... This generalization gap disappears when training with the smaller learning rate of η = 10 4 (yellow curves).