Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Canaryโ€™s Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text

Authors: Matthieu Meeus, Lukas Wutschitz, Santiago Zanella-Beguelin, Shruti Tople, Reza Shokri

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we assume an adversary has access to some synthetic data generated by a LLM. We design membership inference attacks (MIAs) that target the training data used to fine-tune the LLM that is then used to synthesize data. The significant performance of our MIA shows that synthetic data leak information about the training data. Further, we find that canaries crafted for model-based MIAs are sub-optimal for privacy auditing when only synthetic data is released... We design our attacks adapting the state-of-the-art pairwise likelihood ratio tests as in RMIA (Zarifzadeh et al., 2024) and evaluate them on labeled datasets: SST-2 (Socher et al., 2013), AG News (Zhang et al., 2015) and SNLI (Bowman et al., 2015). Our results show that MIAs leveraging only synthetic data achieve AUC scores of 0.74 for SST-2, 0.68 for AG News and 0.77 for SNLI, largely outperforming a random guess baseline.
Researcher Affiliation Collaboration 1Imperial College London 2Microsoft 3National University of Singapore. Correspondence to: Matthieu Meeus <EMAIL>.
Pseudocode Yes Algorithm 1 Membership inference against an LLM-based synthetic text generator... Algorithm 2 Compute membership signal using n-gram model... Algorithm 3 Compute membership signal using similarity metric.
Open Source Code Yes We provide experimental details in Section 4 and Appendix D. The datasets are publicly available, and we release the code necessary to reproduce our results on Github: https://aka.ms/canarysecho.
Open Datasets Yes We design our attacks adapting the state-of-the-art pairwise likelihood ratio tests as in RMIA (Zarifzadeh et al., 2024) and evaluate them on labeled datasets: SST-2 (Socher et al., 2013), AG News (Zhang et al., 2015) and SNLI (Bowman et al., 2015).
Dataset Splits No We consider three datasets that have been widely used to study text classification: (i) the Stanford Sentiment Treebank (SST-2) (Socher et al., 2013), which consists of excerpts from written movie reviews with a binary sentiment label; (ii) the AG News dataset (Zhang et al., 2015), which consists of news articles labelled by category (World, Sport, Business, Sci/Tech).; and (iii) the SNLI dataset (Bowman et al., 2015), which consists of premises and hypotheses labeled as entailment, contradiction or neutral. In all experiments, we remove examples with less than 5 words, bringing the total number of examples to 43 296 for SST-2 and 120 000 for AG News. For SNLI, we selected the first 100 000 records. The paper mentions total record counts for each dataset but does not explicitly provide training/test/validation dataset splits with percentages or specific counts for the main LLM fine-tuning.
Hardware Specification Yes All our experiments have been conducted on a cluster of nodes with 8 V100 NVIDIA GPUs with a floating point precision of 16 (fp16).
Software Dependencies No We built our experiments on two open-source packages: (i) privacy-estimates which provides a distributed implementation of the RMIA attack and (ii) dp-transformers which provides the implementation of the synthetic data generator. The paper mentions the names of software packages used but does not provide specific version numbers for them.
Experiment Setup Yes We optimized training hyperparameters for Lo RA fine-tuning Mistral-7B on SST-2 by running a grid search over learning rate ([1 10 6, 4 10 6, 2 10 5, 6 10 5, 3 10 4, 1 10 3]) and batch size ([64, 128, 256]). We fine-tuned the models for 3 epochs and observed the validation loss plateaued after the first epoch. Based on these results, we selected a learning rate of 2 10 5, effective batch size of 128, sequence length 128, Lo RA r = 4 and fine-tuned the models for 1 epoch.