reproducibilityindex.ai

Overlearning Reveals Sensitive Attributes

Authors: Congzheng Song, Vitaly Shmatikov

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate overlearning in several vision and NLP models and analyze its harmful consequences. To analyze where and why overlearning happens, we empirically show how general features emerge in the lower layers of models trained for simple objectives and conjecture an explanation based on the complexity of the training data.
Researcher Affiliation	Academia	Congzheng Song Cornell University cs2296@cornell.edu Vitaly Shmatikov Cornell Tech shmat@cs.cornell.edu
Pseudocode	Yes	Figure 1: Pseudo-code for inference from representation and adversarial re-purposing and Algorithm 1 De-censoring representations (both on page 3).
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Health is the Heritage Health dataset (Heritage Health Prize)... UTKFace is a set of over 23,000 face images labeled with age, gender, and race (UTKFace; Zhang et al., 2017)... Places365 is a set of 1.8 million images labeled with 365 ﬁne-grained scene categories... Twitter is a set of tweets from the PAN16 dataset (Rangel et al., 2016)... Yelp is a set of Yelp reviews labeled with user identities (Yelp Open Dataset)... PIPA is a set of over 60,000 photos of 2,000 individuals gathered from public Flickr photo albums (Piper project page; Zhang et al., 2015).
Dataset Splits	No	We use 80% of the data for training the target models and 20% for evaluation. This indicates train/test split, but no explicit validation split is mentioned.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software components like 'text CNN', 'Le Net', 'Alex Net', and 'Adam optimizer', but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The number of epochs is 50 for censoring with adversarial training, 30 for the other models. We use the Adam optimizer with the learning rate of 0.001 and batch size of 128. (Section 4.2) and We ﬁne-tune all models for 50 epochs with batch size of 32; the other hyper-parameters are as in Section 4.2. (Section 4.3)