reproducibilityindex.ai

Data-Copying in Generative Models: A Formal Framework

Authors: Robi Bhattacharjee, Sanjoy Dasgupta, Kamalika Chaudhuri

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We include an empirical comparison between our methods in Section 5.2, where we demonstrate that ours is able to capture certain forms of data-copying that theirs is not. We now return to the example presented in Figure 3 and empirically investigate the following question: is our algorithm able to outperform the one given in (Meehan et al., 2020) over this example?
Researcher Affiliation	Academia	Robi Bhattacharjee 1 Sanjoy Dasgupta 1 Kamalika Chaudhuri 1 *Equal contribution 1UCSD. Correspondence to: Robi Bhattacharjee <rcbhatta@ucsd.edu>.
Pseudocode	Yes	Algorithm 1: Est(x, r, S) Algorithm 2: Data Copy Detect(S, q, m) Algorithm 3: Estimate k(S)
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing code or links to a code repository.
Open Datasets	Yes	Our data distribution, p, is the Halfmoon dataset with Gaussian noise (σ = 0.1). Our generated distribution, q, is trained from an i.i.d sample of 2000 points from p, S p2000.
Dataset Splits	No	The paper describes the generation of the distribution q and the use of a training sample S for q, but it does not provide explicit training, validation, or test dataset splits for the models or detector evaluated in the experiments.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models or other computing specifications used for the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers, such as programming languages or libraries used for implementation.
Experiment Setup	Yes	Our data distribution, p, is the Halfmoon dataset with Gaussian noise (σ = 0.1). To construct qcopy... with a small amount of spherical noise (with radius 0.02). To construct qunderfit... with a moderate amount of spherical noise (with radius 0.25). We ﬁx λ = 20 and γ = 0.00025 as constants for data-copy detection. We directly set m = 200, 000. For Est(x, r, S), we set b = 400 We set λ = 20 and γ = 1 4000.