Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences

Authors: Damien Ferbach, Quentin Bertrand, Joey Bose, Gauthier Gidel

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct illustrative experiments on both synthetic datasets and on CIFAR10 showing that such a procedure amplifies biases of the reward model.4 Experiments This section aims to empirically illustrate our previous theoretical results on how curation impacts the self-consuming loop.
Researcher Affiliation Academia 1Mila, Université de Montréal 2Ecole Normale Supérieure de Paris 3University of Oxford, 4 Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1 Iterative retraining with curated synthetic data input : Dreal := {xi}n i=1, A // True data, learning procedure, param: T, λ, β // Number of retraining iterations, proportion of gen. data, reward multiplicative factor p0 = A(Dreal) // Learn generative model on true data for t in 1, . . . , T do for i in 1, . . . , λ n do x1, . . . , x K pt 1 // Sample K synthetic data points xk is selected by a user with probability er( xk) K j=1 er( xj ) , 1 k K . // Luce s model ˆxi xk Dfiltered = {ˆxi} λ n i=1 // New filtered dataset pt = A(Dreal Dfiltered) // Generative model is learned on synthetic and true data return p T
Open Source Code No We did not release the code in the submission, since the experiments are mainly illustrative of our theoretical results and were not required by the reviewers.
Open Datasets Yes The initial model has been pretrained on the 50000 train images of the CIFAR-10 dataset (Krizhevsky et al., 2009).
Dataset Splits No The initial model has been pretrained on the 50000 train images of the CIFAR-10 dataset (Krizhevsky et al., 2009).
Hardware Specification Yes On a A100 GPU of 40GB RAM and using 4 workers with total 32 GB RAM, retraining for 20 iterations with generation of 50000 samples took about 22 hours.
Software Dependencies No We train a normalizing flow using optimal transport conditional flow matching (Lipman et al., 2022; Shaul et al., 2023; Tong et al., 2023b) with the torchcfm library Tong et al. (2023a, 2024).
Experiment Setup Yes We use a time discretization in 250 steps. Finally, we retrain the model for multiple iterations (8 for Mo G, 5 for two moons), first only on real data and then on filtered synthetic samples from the previous iteration using pairwise comparisons. We use 5 103 initial samples from the real data distribution and 5 103 generated samples filtered from 104 generated initial samples. When mixing, we use equal fractions of real and filtered samples. For the two moons we add a Gaussian noise with standard deviation 1.10-1. At each iteration, we generate 5 104 samples using the current model from which we keep 2.5 103 samples filtered by discrete K-choice comparisons. The reward r(x) is computed using the class probabilities q0(x), . . . , q9(x) from a pretrained VGG11 classifier (Simonyan and Zisserman, 2014) with 92.39% test accuracy. Due to the expensive compute cost of retraining a generative model for multiple iterations (c.f. Appendix A.5.4), we plot only one run on each figure. To ensure the reproducibility of our results, we plot the retraining curves for 3 independent runs in Figure 11 in the appendix, illustrating that they have small variance.