Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PASS: Private Attributes Protection with Stochastic Data Substitution

Authors: Yizhuo Chen, Chun-Fu Chen, Hsiang Hsu, Shaohan Hu, Tarek F. Abdelzaher

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS s effectiveness and generalizability. 5. Experiments: We thoroughly evaluated PASS on three multi-attribute benchmark datasets, each representing a different application of a different modality. These datasets include Audio MNIST (Becker et al., 2018), containing recordings of human voices; Motion Sense (Malekzadeh et al., 2019), consisting of human activity sensory signals; and Celeb A (Liu et al., 2015), containing facial images.
Researcher Affiliation	Collaboration	Yizhuo Chen 1 2 Chun-Fu (Richard) Chen 2 Hsiang Hsu 2 Shaohan Hu 2 Tarek Abdelzaher 1 ... 1Department of Computer Science, University of Illinois Urbana-Champaign, USA 2Global Technology Applied Research, JPMorgan Chase, USA. Correspondence to: Yizhuo Chen <EMAIL>, Chun-Fu (Richard) Chen <EMAIL>.
Pseudocode	Yes	Algorithm 1 PASS Training Pseudo-code ... Algorithm 2 PASS Inference Pseudo-code
Open Source Code	No	The paper does not contain any explicit statements about releasing source code, a link to a repository, or mentions of code in supplementary materials.
Open Datasets	Yes	We thoroughly evaluated PASS on three multi-attribute benchmark datasets, each representing a different application of a different modality. These datasets include Audio MNIST (Becker et al., 2018), containing recordings of human voices; Motion Sense (Malekzadeh et al., 2019), consisting of human activity sensory signals; and Celeb A (Liu et al., 2015), containing facial images.
Dataset Splits	Yes	Audio MNIST Dataset ... The dataset contains 30,000 audio clips, divided into 24,000 for training and 6,000 for validation. ... Motion Sense Dataset ... segmented the datasets into 74,324 samples, each with a length of 128. ... Table 6. Training-testing split 7:4 ... Celeb A Dataset ... We used the official split for training and validation.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory amounts.
Software Dependencies	No	Table 6 lists 'Optimizer Adam W (Loshchilov & Hutter, 2019)' but does not specify the version of the underlying deep learning framework (e.g., PyTorch, TensorFlow) or any other key libraries with their version numbers. Appendix E.1 mentions 'Hu BERT-B (Hsu et al., 2021)' but this refers to a model used for feature extraction, not a general software dependency with version details.
Experiment Setup	Yes	Table 6. Detailed configurations of our experiments datasets, models, and optimization techniques. Optimizer Adam W (Loshchilov & Hutter, 2019) Learning rate 0.001 0.001 0.0001 Weight decay 0.0001 Learning rate scheduler Cosine scheduler Embeddings f(x) and g(x ) dimension 512 Pθ(X \|X) training epochs 2000 200 50 Probing Attack training epochs 2000 200 50 ... Unless otherwise specified, we set λ = N/M and µ = 0.2N throughout our experiments to balance private attributes protection, useful attributes preservation, and general feature preservation. The substitute dataset is constructed by randomly sampling 4096 data points from the training dataset.