Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PFGuard: A Generative Framework with Privacy and Fairness Safeguards

Authors: Soyeon Kim, Yuji Roh, Geon Heo, Steven Whang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments to evaluate PFGuard s effectiveness in terms of fairness, privacy, and utility. We evaluate PFGuard on three image datasets: 1) MNIST (Le Cun et al., 1998) and Fashion MNIST (Xiao et al., 2017) for various analyses and baseline comparisons, and 2) Celeb A (Liu et al., 2015) to observe performance in real-world scenarios more closely related to privacy and fairness concerns. Table 1 shows the fairness and utility performances on synthetic data. Table 2 shows the fairness and utility performances on downstream tasks.
Researcher Affiliation	Collaboration	1 KAIST, EMAIL 2 Google, EMAIL
Pseudocode	Yes	Algorithm 1 Integrating PFGuard with PTEL-based generative models
Open Source Code	No	The reproducibility statement mentions providing a description of the algorithm and implementation details but does not explicitly state that source code for the methodology is released or provide a link. "REPRODUCIBILITY STATEMENT All datasets, methodologies, and experimental setups used in our study are described in detail in the supplementary material. More specifically, we provide a description of the proposed algorithm in Sec. C.2, details of datasets and preprocessing in Sec. D.1, and implementation details including hyperparameters in Sec. D.2 to ensure reproducibility."
Open Datasets	Yes	We evaluate PFGuard on three image datasets: 1) MNIST (Le Cun et al., 1998) and Fashion MNIST (Xiao et al., 2017) for various analyses and baseline comparisons, and 2) Celeb A (Liu et al., 2015) to observe performance in real-world scenarios more closely related to privacy and fairness concerns.
Dataset Splits	Yes	MNIST and Fashion MNIST... Both datasets have 60,000 training examples and 10,000 testing examples. Celeb A contains 202,599 celebrity face images. We use the official preprocessed version with face alignment and follow the official training and testing partition (Liu et al., 2015).
Hardware Specification	Yes	In all experiments, we use Py Torch and perform experiments using NVIDIA Quadro RTX 8000 GPUs.
Software Dependencies	No	The paper mentions "Py Torch" but does not specify a version number for it or any other key software libraries or dependencies. "In all experiments, we use Py Torch and perform experiments using NVIDIA Quadro RTX 8000 GPUs."
Experiment Setup	No	The paper states that hyperparameters are used from official Github codes of baseline models, but does not explicitly list concrete hyperparameter values within the paper itself. "For all models, we refer to their official Github codes to implement their models and to use their best-performing hyperparameters for MNIST, Fashion MNIST, and Celeb A."