Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Refining Language Model Anonymizers via Adversarial Distillation

Authors: Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Synth PAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities. Notably, 8B models attain a privacy-utility trade-off comparable to that of the GPT-4 anonymizer and, with self-refinement, even surpass it in terms of privacy protection. These results highlight the effectiveness of our adversarial distillation framework for training SLMs as efficient anonymizers.
Researcher Affiliation Academia Kyuyoung Kim 1, Hyunjun Jeon 1, Jinwoo Shin1 ... This research was supported in part by Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. RS-2019-II190075, Artificial Intelligence Graduate School Support Program (KAIST); No. RS-2021-II212068, Artificial Intelligence Innovation Hub).
Pseudocode Yes Algorithm 1 SEAL: Self-refining anonymizer training
Open Source Code Yes Code and implementation details are available at https://github.com/kykim0/SEAL
Open Datasets Yes To facilitate further research, we release the full dataset used in our experiments.
Dataset Splits Yes From this dataset, we select 3,456 instances with high-quality human labels covering eight personal attributes: age, education level, gender, income level, location, marital status, occupation, and place of birth. These comments serve as the initial texts for anonymization. To generate anonymization trajectories for distillation, we simulate adversarial anonymization [9] for up to three steps, using GPT-4o for anonymization, attribute inference, and utility assessment. We use 275 of the 300 synthetic profiles for trajectory generation and hold out the remaining 25 for evaluation, which we refer to as the main eval dataset. This setup illustrates a practical use case of our framework: distillation data can be generated on synthetic profiles using external LLMs, while the distilled SLMs can subsequently operate on real, internal data locally, without invoking potentially untrusted external models. Our analysis of Synth PAI shows that texts with contextually embedded personal information as opposed to explicit identifiers are significantly harder to anonymize. To evaluate performance on these more challenging cases, we construct 500 additional synthetic texts containing such embedded personal information using the 25 held-out profiles, forming the hard eval dataset.
Hardware Specification Yes We used NVIDIA A6000 GPUs for all of our experiments.
Software Dependencies Yes For both SFT and DPO, we used the Adam W optimizer [26] with default hyperparameters: β1 of 0.9, β2 of 0.999, and ϵ of 1e-8. All models were trained with Flash Attention 2 [27] enabled.
Experiment Setup Yes Table 6: Hyperparameters used for SFT and DPO. For SFT, models trained on both anonymization and critique tasks were trained for one epoch, while those trained on anonymization only were trained for two epochs. DPO was applied for one epoch using the anonymization preference data described in Section 4.2. The same settings were used for both Llama and Qwen models.