Data-Copying in Generative Models: A Formal Framework
Authors: Robi Bhattacharjee, Sanjoy Dasgupta, Kamalika Chaudhuri
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We include an empirical comparison between our methods in Section 5.2, where we demonstrate that ours is able to capture certain forms of data-copying that theirs is not. We now return to the example presented in Figure 3 and empirically investigate the following question: is our algorithm able to outperform the one given in (Meehan et al., 2020) over this example? |
| Researcher Affiliation | Academia | Robi Bhattacharjee 1 Sanjoy Dasgupta 1 Kamalika Chaudhuri 1 *Equal contribution 1UCSD. Correspondence to: Robi Bhattacharjee <rcbhatta@ucsd.edu>. |
| Pseudocode | Yes | Algorithm 1: Est(x, r, S) Algorithm 2: Data Copy Detect(S, q, m) Algorithm 3: Estimate k(S) |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing code or links to a code repository. |
| Open Datasets | Yes | Our data distribution, p, is the Halfmoon dataset with Gaussian noise (σ = 0.1). Our generated distribution, q, is trained from an i.i.d sample of 2000 points from p, S p2000. |
| Dataset Splits | No | The paper describes the generation of the distribution q and the use of a training sample S for q, but it does not provide explicit training, validation, or test dataset splits for the models or detector evaluated in the experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models or other computing specifications used for the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers, such as programming languages or libraries used for implementation. |
| Experiment Setup | Yes | Our data distribution, p, is the Halfmoon dataset with Gaussian noise (σ = 0.1). To construct qcopy... with a small amount of spherical noise (with radius 0.02). To construct qunderfit... with a moderate amount of spherical noise (with radius 0.25). We fix λ = 20 and γ = 0.00025 as constants for data-copy detection. We directly set m = 200, 000. For Est(x, r, S), we set b = 400 We set λ = 20 and γ = 1 4000. |