Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Refining Language Model Anonymizers via Adversarial Distillation

Authors: Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Synth PAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities. Notably, 8B models attain a privacy-utility trade-off comparable to that of the GPT-4 anonymizer and, with self-refinement, even surpass it in terms of privacy protection. These results highlight the effectiveness of our adversarial distillation framework for training SLMs as efficient anonymizers.
Researcher Affiliation	Academia	Kyuyoung Kim 1, Hyunjun Jeon 1, Jinwoo Shin1 ... This research was supported in part by Institute for Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. RS-2019-II190075, Artificial Intelligence Graduate School Support Program (KAIST); No. RS-2021-II212068, Artificial Intelligence Innovation Hub).
Pseudocode	Yes	Algorithm 1 SEAL: Self-refining anonymizer training
Open Source Code	Yes	Code and implementation details are available at https://github.com/kykim0/SEAL
Open Datasets	Yes	To facilitate further research, we release the full dataset used in our experiments.
Dataset Splits	Yes	From this dataset, we select 3,456 instances with high-quality human labels covering eight personal attributes: age, education level, gender, income level, location, marital status, occupation, and place of birth. These comments serve as the initial texts for anonymization. To generate anonymization trajectories for distillation, we simulate adversarial anonymization [9] for up to three steps, using GPT-4o for anonymization, attribute inference, and utility assessment. We use 275 of the 300 synthetic profiles for trajectory generation and hold out the remaining 25 for evaluation, which we refer to as the main eval dataset. This setup illustrates a practical use case of our framework: distillation data can be generated on synthetic profiles using external LLMs, while the distilled SLMs can subsequently operate on real, internal data locally, without invoking potentially untrusted external models. Our analysis of Synth PAI shows that texts with contextually embedded personal information as opposed to explicit identifiers are significantly harder to anonymize. To evaluate performance on these more challenging cases, we construct 500 additional synthetic texts containing such embedded personal information using the 25 held-out profiles, forming the hard eval dataset.
Hardware Specification	Yes	We used NVIDIA A6000 GPUs for all of our experiments.
Software Dependencies	Yes	For both SFT and DPO, we used the Adam W optimizer [26] with default hyperparameters: β1 of 0.9, β2 of 0.999, and ϵ of 1e-8. All models were trained with Flash Attention 2 [27] enabled.
Experiment Setup	Yes	Table 6: Hyperparameters used for SFT and DPO. For SFT, models trained on both anonymization and critique tasks were trained for one epoch, while those trained on anonymization only were trained for two epochs. DPO was applied for one epoch using the anonymization preference data described in Section 4.2. The same settings were used for both Llama and Qwen models.