Data-Copying in Generative Models: A Formal Framework

Authors: Robi Bhattacharjee, Sanjoy Dasgupta, Kamalika Chaudhuri

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We include an empirical comparison between our methods in Section 5.2, where we demonstrate that ours is able to capture certain forms of data-copying that theirs is not. We now return to the example presented in Figure 3 and empirically investigate the following question: is our algorithm able to outperform the one given in (Meehan et al., 2020) over this example?
Researcher Affiliation Academia Robi Bhattacharjee 1 Sanjoy Dasgupta 1 Kamalika Chaudhuri 1 *Equal contribution 1UCSD. Correspondence to: Robi Bhattacharjee <rcbhatta@ucsd.edu>.
Pseudocode Yes Algorithm 1: Est(x, r, S) Algorithm 2: Data Copy Detect(S, q, m) Algorithm 3: Estimate k(S)
Open Source Code No The paper does not provide any explicit statements about open-sourcing code or links to a code repository.
Open Datasets Yes Our data distribution, p, is the Halfmoon dataset with Gaussian noise (σ = 0.1). Our generated distribution, q, is trained from an i.i.d sample of 2000 points from p, S p2000.
Dataset Splits No The paper describes the generation of the distribution q and the use of a training sample S for q, but it does not provide explicit training, validation, or test dataset splits for the models or detector evaluated in the experiments.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models or other computing specifications used for the experiments.
Software Dependencies No The paper does not specify software dependencies with version numbers, such as programming languages or libraries used for implementation.
Experiment Setup Yes Our data distribution, p, is the Halfmoon dataset with Gaussian noise (σ = 0.1). To construct qcopy... with a small amount of spherical noise (with radius 0.02). To construct qunderfit... with a moderate amount of spherical noise (with radius 0.25). We fix λ = 20 and γ = 0.00025 as constants for data-copy detection. We directly set m = 200, 000. For Est(x, r, S), we set b = 400 We set λ = 20 and γ = 1 4000.