reproducibilityindex.ai

Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences

Authors: Damien Ferbach, Quentin Bertrand, Joey Bose, Gauthier Gidel

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct illustrative experiments on both synthetic datasets and on CIFAR10 showing that such a procedure ampliﬁes biases of the reward model.4 Experiments This section aims to empirically illustrate our previous theoretical results on how curation impacts the self-consuming loop.
Researcher Affiliation	Academia	1Mila, Université de Montréal 2Ecole Normale Supérieure de Paris 3University of Oxford, 4 Canada CIFAR AI Chair
Pseudocode	Yes	Algorithm 1 Iterative retraining with curated synthetic data input : Dreal := {xi}n i=1, A // True data, learning procedure, param: T, λ, β // Number of retraining iterations, proportion of gen. data, reward multiplicative factor p0 = A(Dreal) // Learn generative model on true data for t in 1, . . . , T do for i in 1, . . . , λ n do x1, . . . , x K pt 1 // Sample K synthetic data points xk is selected by a user with probability er( xk) K j=1 er( xj ) , 1 k K . // Luce s model ˆxi xk Dﬁltered = {ˆxi} λ n i=1 // New filtered dataset pt = A(Dreal Dﬁltered) // Generative model is learned on synthetic and true data return p T
Open Source Code	No	We did not release the code in the submission, since the experiments are mainly illustrative of our theoretical results and were not required by the reviewers.
Open Datasets	Yes	The initial model has been pretrained on the 50000 train images of the CIFAR-10 dataset (Krizhevsky et al., 2009).
Dataset Splits	No	The initial model has been pretrained on the 50000 train images of the CIFAR-10 dataset (Krizhevsky et al., 2009).
Hardware Specification	Yes	On a A100 GPU of 40GB RAM and using 4 workers with total 32 GB RAM, retraining for 20 iterations with generation of 50000 samples took about 22 hours.
Software Dependencies	No	We train a normalizing ﬂow using optimal transport conditional ﬂow matching (Lipman et al., 2022; Shaul et al., 2023; Tong et al., 2023b) with the torchcfm library Tong et al. (2023a, 2024).
Experiment Setup	Yes	We use a time discretization in 250 steps. Finally, we retrain the model for multiple iterations (8 for Mo G, 5 for two moons), ﬁrst only on real data and then on ﬁltered synthetic samples from the previous iteration using pairwise comparisons. We use 5 103 initial samples from the real data distribution and 5 103 generated samples ﬁltered from 104 generated initial samples. When mixing, we use equal fractions of real and ﬁltered samples. For the two moons we add a Gaussian noise with standard deviation 1.10-1. At each iteration, we generate 5 104 samples using the current model from which we keep 2.5 103 samples ﬁltered by discrete K-choice comparisons. The reward r(x) is computed using the class probabilities q0(x), . . . , q9(x) from a pretrained VGG11 classiﬁer (Simonyan and Zisserman, 2014) with 92.39% test accuracy. Due to the expensive compute cost of retraining a generative model for multiple iterations (c.f. Appendix A.5.4), we plot only one run on each ﬁgure. To ensure the reproducibility of our results, we plot the retraining curves for 3 independent runs in Figure 11 in the appendix, illustrating that they have small variance.