Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Label Leakage and Protection in Two-party Split Learning

Authors: Oscar Li, Jiankai Sun, Xin Yang, Weihao Gao, Hongyi Zhang, Junyuan Xie, Virginia Smith, Chong Wang

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically1 demonstrate the effectiveness of our protection techniques against the identified attacks, and show that Marvell in particular has improved privacy-utility tradeoffs relative to baseline approaches. ... We experimentally demonstrate the effectiveness of our protection techniques and MARVELL’s improved privacy-utility tradeoffs compared to other protection baselines (Section 5). ... In this section, we first describe our experiment setup and then demonstrate the label protection quality of Marvell as well as its privacy-utility trade-off relative to baseline approaches. Empirical Setup. We use three real-world binary classification datasets for evaluation: Criteo and Avazu, two online advertising prediction datasets with millions of examples; and ISIC, a healthcare image dataset for skin cancer prediction.
Researcher Affiliation	Collaboration	Oscar Li1 , Jiankai Sun2 , Xin Yang2, Weihao Gao2, 1Carnegie Mellon University Hongyi Zhang2, Junyuan Xie2, Virginia Smith1, Chong Wang2 2Byte Dance Inc.
Pseudocode	Yes	Algorithm 1: Marvell algorithm
Open Source Code	Yes	1Code available at https://github.com/OscarcarLi/label-protection
Open Datasets	Yes	We use three real-world binary classification datasets for evaluation: Criteo and Avazu, two online advertising prediction datasets with millions of examples; and ISIC, a healthcare image dataset for skin cancer prediction. ... Criteo. Criteo display advertising challenge, 2014. URL https://www.kaggle.com/c/ criteo-display-ad-challenge/data. ... Avazu. Avazu click-through rate prediction, 2015. URL https://www.kaggle.com/c/ avazu-ctr-prediction/data. ... ISIC. Siim-isic melanoma classification, 2020. URL https://www.kaggle.com/c/ siim-isic-melanoma-classification/data.
Dataset Splits	No	The paper specifies train-test splits (e.g., "90%-10% train-test split" for Criteo and Avazu, "80%-20% training and test split" for ISIC), but it does not mention a distinct validation set split.
Hardware Specification	Yes	We conduct our experiments over 16 Nvidia 1080Ti GPU card.
Software Dependencies	No	The paper mentions using the "Adam optimizer" but does not specify its version or any other software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	[Criteo] We use the Adam optimizer with a batch size of 1024 and a learning rate of 1e-4 throughout the entire training of 5 epochs (approximately 20k stochastic gradient updates). [ISIC] We use the Adam optimizer with a batch size of 128 and a learning rate of 1e-5 throughout the entire training of 1000 epochs (approximately 35k stochastic gradient updates). [Avazu] We use the Adam optimizer with a batch size of 32768 and a learning rate of 1e-4 throughout the entire training of 5 epochs (approximately 5.5k stochastic gradient updates).