Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Group-Based Personalized Model for Image Privacy Classification and Labeling

Authors: Haoti Zhong, Anna Squicciarini, David Miller, Cornelia Caragea

IJCAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that, on a dataset of 114 users and about 3,400 image labelings, our model achieves an overall accuracy measure of 79.31% when a few (15) images are used to infer group associations for each (test) user.
Researcher Affiliation	Academia	Haoti Zhong Dept. of Electrical Eng. Pennsylvania State University EMAIL; Anna Squicciarini Information Sciences and Technology Pennsylvania State University EMAIL; David Miller Dept. of Electrical Eng. Pennsylvania State University EMAIL; Cornelia Caragea Department of Computer Science University of North Texas EMAIL
Pseudocode	No	The paper describes the Expectation-Maximization (EM) algorithm and gradient ascent used in their model through mathematical equations and textual descriptions, but it does not include a structured pseudocode block or algorithm.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for their methodology or any links to a code repository.
Open Datasets	No	The imageset was taken from the Picalert study, a collection of images with varying degrees of sensitivity [Zerr et al., 2012]. We collected our own dataset for testing purposes as follows... In total, 114 valid user responses were collected and 3420 labels in total (2496 public labels and 924 private labels).
Dataset Splits	Yes	We ﬁrst divide the dataset into 10 (outer) folds, and use 9 of these folds for training-plus-validation, with the last fold used for testing. To calculate the optimized hyper-parameter, we further split the collection of nine training-plus-validation fold samples, again using 5-fold cross validation, with four of these (inner) folds used for training and one for validation.
Hardware Specification	No	The paper mentions using Deep Learning for feature extraction and Caffe, but it does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions using Caffe [Jia et al., 2014] for deep learning features and SVMs for baselines, but it does not provide specific version numbers for these or any other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	The search grid for K is chosen from 4 to 7 with search step of 1, and M is chosen over a range from 20-50 with a search step of 5, to maximize the average (inner) validation fold CV accuracy. We found that M=40 and K = 6 ﬁt best for this dataset. Larger M and K may be found for larger datasets (with more users). L was chosen to be the minimum number such that the patches cover 90% of the image support. Thus, L = 100.