Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints

Authors: Ehsan Kazemi, Morteza Zadimoghaddam, Amin Karbasi

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even 80% of data deletion.
Researcher Affiliation Collaboration 1Department of Computer Science, Yale University, New Haven, Connecticut, USA 2Google Research, Zurich, Switzerland.
Pseudocode Yes Algorithm 1 ROBUST-CORESET-CENTRALIZED", "Algorithm 2 ROBUST-CENTRALIZED", "Algorithm 3 ROBUST-CORESET-STREAMING", "Algorithm 4 ROBUST-DISTRIBUTED
Open Source Code No The paper does not provide any statement or link regarding the release of source code for the methodology described.
Open Datasets Yes We extensively evaluate the performance of our algorithms on several publicly available real-world datasets." and specific dataset citations like "Uber Dataset. Uber Pickups in New York City. URL https://www. kaggle.com/fivethirtyeight/ uber-pickups-in-new-york-city." and "Adult Income dataset from UCI Repository (Blake & Merz, 1998)."
Dataset Splits No The paper mentions 16,281 test cases for the Adult Income dataset but does not provide specific percentages or counts for training or validation splits, nor details on the splitting methodology or cross-validation.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory, or cloud instances).
Software Dependencies No The paper mentions training Naive Bayes and SVM classifiers but does not specify any software libraries with version numbers (e.g., Python 3.x, PyTorch x.x, scikit-learn x.x).
Experiment Setup No The paper mentions some problem-specific parameters (e.g., d=5, k=20 for Uber; k=5, k=10 for Adult Income; m=12, d=25, epsilon=0.1 for Census1990) but does not provide concrete hyperparameter values or system-level training settings for the machine learning models (e.g., learning rate, batch size, optimizer, epochs, model initialization).