Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
PFGuard: A Generative Framework with Privacy and Fairness Safeguards
Authors: Soyeon Kim, Yuji Roh, Geon Heo, Steven Whang
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments to evaluate PFGuard s effectiveness in terms of fairness, privacy, and utility. We evaluate PFGuard on three image datasets: 1) MNIST (Le Cun et al., 1998) and Fashion MNIST (Xiao et al., 2017) for various analyses and baseline comparisons, and 2) Celeb A (Liu et al., 2015) to observe performance in real-world scenarios more closely related to privacy and fairness concerns. Table 1 shows the fairness and utility performances on synthetic data. Table 2 shows the fairness and utility performances on downstream tasks. |
| Researcher Affiliation | Collaboration | 1 KAIST, EMAIL 2 Google, EMAIL |
| Pseudocode | Yes | Algorithm 1 Integrating PFGuard with PTEL-based generative models |
| Open Source Code | No | The reproducibility statement mentions providing a description of the algorithm and implementation details but does not explicitly state that source code for the methodology is released or provide a link. "REPRODUCIBILITY STATEMENT All datasets, methodologies, and experimental setups used in our study are described in detail in the supplementary material. More specifically, we provide a description of the proposed algorithm in Sec. C.2, details of datasets and preprocessing in Sec. D.1, and implementation details including hyperparameters in Sec. D.2 to ensure reproducibility." |
| Open Datasets | Yes | We evaluate PFGuard on three image datasets: 1) MNIST (Le Cun et al., 1998) and Fashion MNIST (Xiao et al., 2017) for various analyses and baseline comparisons, and 2) Celeb A (Liu et al., 2015) to observe performance in real-world scenarios more closely related to privacy and fairness concerns. |
| Dataset Splits | Yes | MNIST and Fashion MNIST... Both datasets have 60,000 training examples and 10,000 testing examples. Celeb A contains 202,599 celebrity face images. We use the official preprocessed version with face alignment and follow the official training and testing partition (Liu et al., 2015). |
| Hardware Specification | Yes | In all experiments, we use Py Torch and perform experiments using NVIDIA Quadro RTX 8000 GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch" but does not specify a version number for it or any other key software libraries or dependencies. "In all experiments, we use Py Torch and perform experiments using NVIDIA Quadro RTX 8000 GPUs." |
| Experiment Setup | No | The paper states that hyperparameters are used from official Github codes of baseline models, but does not explicitly list concrete hyperparameter values within the paper itself. "For all models, we refer to their official Github codes to implement their models and to use their best-performing hyperparameters for MNIST, Fashion MNIST, and Celeb A." |