Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints
Authors: Ehsan Kazemi, Morteza Zadimoghaddam, Amin Karbasi
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even 80% of data deletion. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Yale University, New Haven, Connecticut, USA 2Google Research, Zurich, Switzerland. |
| Pseudocode | Yes | Algorithm 1 ROBUST-CORESET-CENTRALIZED", "Algorithm 2 ROBUST-CENTRALIZED", "Algorithm 3 ROBUST-CORESET-STREAMING", "Algorithm 4 ROBUST-DISTRIBUTED |
| Open Source Code | No | The paper does not provide any statement or link regarding the release of source code for the methodology described. |
| Open Datasets | Yes | We extensively evaluate the performance of our algorithms on several publicly available real-world datasets." and specific dataset citations like "Uber Dataset. Uber Pickups in New York City. URL https://www. kaggle.com/fivethirtyeight/ uber-pickups-in-new-york-city." and "Adult Income dataset from UCI Repository (Blake & Merz, 1998)." |
| Dataset Splits | No | The paper mentions 16,281 test cases for the Adult Income dataset but does not provide specific percentages or counts for training or validation splits, nor details on the splitting methodology or cross-validation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory, or cloud instances). |
| Software Dependencies | No | The paper mentions training Naive Bayes and SVM classifiers but does not specify any software libraries with version numbers (e.g., Python 3.x, PyTorch x.x, scikit-learn x.x). |
| Experiment Setup | No | The paper mentions some problem-specific parameters (e.g., d=5, k=20 for Uber; k=5, k=10 for Adult Income; m=12, d=25, epsilon=0.1 for Census1990) but does not provide concrete hyperparameter values or system-level training settings for the machine learning models (e.g., learning rate, batch size, optimizer, epochs, model initialization). |