reproducibilityindex.ai

Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints

Authors: Ehsan Kazemi, Morteza Zadimoghaddam, Amin Karbasi

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even 80% of data deletion.
Researcher Affiliation	Collaboration	1Department of Computer Science, Yale University, New Haven, Connecticut, USA 2Google Research, Zurich, Switzerland.
Pseudocode	Yes	Algorithm 1 ROBUST-CORESET-CENTRALIZED", "Algorithm 2 ROBUST-CENTRALIZED", "Algorithm 3 ROBUST-CORESET-STREAMING", "Algorithm 4 ROBUST-DISTRIBUTED
Open Source Code	No	The paper does not provide any statement or link regarding the release of source code for the methodology described.
Open Datasets	Yes	We extensively evaluate the performance of our algorithms on several publicly available real-world datasets." and specific dataset citations like "Uber Dataset. Uber Pickups in New York City. URL https://www. kaggle.com/fivethirtyeight/ uber-pickups-in-new-york-city." and "Adult Income dataset from UCI Repository (Blake & Merz, 1998)."
Dataset Splits	No	The paper mentions 16,281 test cases for the Adult Income dataset but does not provide specific percentages or counts for training or validation splits, nor details on the splitting methodology or cross-validation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory, or cloud instances).
Software Dependencies	No	The paper mentions training Naive Bayes and SVM classifiers but does not specify any software libraries with version numbers (e.g., Python 3.x, PyTorch x.x, scikit-learn x.x).
Experiment Setup	No	The paper mentions some problem-specific parameters (e.g., d=5, k=20 for Uber; k=5, k=10 for Adult Income; m=12, d=25, epsilon=0.1 for Census1990) but does not provide concrete hyperparameter values or system-level training settings for the machine learning models (e.g., learning rate, batch size, optimizer, epochs, model initialization).