Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences
Authors: Damien Ferbach, Quentin Bertrand, Joey Bose, Gauthier Gidel
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct illustrative experiments on both synthetic datasets and on CIFAR10 showing that such a procedure amplifies biases of the reward model.4 Experiments This section aims to empirically illustrate our previous theoretical results on how curation impacts the self-consuming loop. |
| Researcher Affiliation | Academia | 1Mila, Université de Montréal 2Ecole Normale Supérieure de Paris 3University of Oxford, 4 Canada CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1 Iterative retraining with curated synthetic data input : Dreal := {xi}n i=1, A // True data, learning procedure, param: T, λ, β // Number of retraining iterations, proportion of gen. data, reward multiplicative factor p0 = A(Dreal) // Learn generative model on true data for t in 1, . . . , T do for i in 1, . . . , λ n do x1, . . . , x K pt 1 // Sample K synthetic data points xk is selected by a user with probability er( xk) K j=1 er( xj ) , 1 k K . // Luce s model ˆxi xk Dfiltered = {ˆxi} λ n i=1 // New filtered dataset pt = A(Dreal Dfiltered) // Generative model is learned on synthetic and true data return p T |
| Open Source Code | No | We did not release the code in the submission, since the experiments are mainly illustrative of our theoretical results and were not required by the reviewers. |
| Open Datasets | Yes | The initial model has been pretrained on the 50000 train images of the CIFAR-10 dataset (Krizhevsky et al., 2009). |
| Dataset Splits | No | The initial model has been pretrained on the 50000 train images of the CIFAR-10 dataset (Krizhevsky et al., 2009). |
| Hardware Specification | Yes | On a A100 GPU of 40GB RAM and using 4 workers with total 32 GB RAM, retraining for 20 iterations with generation of 50000 samples took about 22 hours. |
| Software Dependencies | No | We train a normalizing flow using optimal transport conditional flow matching (Lipman et al., 2022; Shaul et al., 2023; Tong et al., 2023b) with the torchcfm library Tong et al. (2023a, 2024). |
| Experiment Setup | Yes | We use a time discretization in 250 steps. Finally, we retrain the model for multiple iterations (8 for Mo G, 5 for two moons), first only on real data and then on filtered synthetic samples from the previous iteration using pairwise comparisons. We use 5 103 initial samples from the real data distribution and 5 103 generated samples filtered from 104 generated initial samples. When mixing, we use equal fractions of real and filtered samples. For the two moons we add a Gaussian noise with standard deviation 1.10-1. At each iteration, we generate 5 104 samples using the current model from which we keep 2.5 103 samples filtered by discrete K-choice comparisons. The reward r(x) is computed using the class probabilities q0(x), . . . , q9(x) from a pretrained VGG11 classifier (Simonyan and Zisserman, 2014) with 92.39% test accuracy. Due to the expensive compute cost of retraining a generative model for multiple iterations (c.f. Appendix A.5.4), we plot only one run on each figure. To ensure the reproducibility of our results, we plot the retraining curves for 3 independent runs in Figure 11 in the appendix, illustrating that they have small variance. |