Fair and Diverse DPP-Based Data Summarization
Authors: Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, Nisheeth Vishnoi
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on both real-world and synthetic datasets show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case. |
| Researcher Affiliation | Collaboration | 1 Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland 2Microsoft Research, India 3UC Berkeley. |
| Pseudocode | Yes | Algorithm 1 Sample-And-Project |
| Open Source Code | No | The paper does not provide a link or explicit statement about the availability of the source code for the described methodology. |
| Open Datasets | Yes | We gathered a collection of images curated using Google image search as follows: Four search terms were used: (a) Scientist Male , (b) Scientist Female , (c) Painter Male , and (d) Painter Female (Imagedataset). ... The Adult income dataset (Blake & Merz, 1998) consists of roughly 45000 records of subjects... Data downloaded from https://archive.ics.uci. edu/ml/datasets/adult. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test splits for the datasets used in the experiments. It mentions sampling 'k samples' or 'sets of size 400' but not specific dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions 'vlfeat toolbox' and 'k-means algorithm' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We sample 40 images from each biased dataset... We conduct 200 repetitions. ... In preprocessing the data we filter out incomplete entries, and from the remaining ones we pick a random subset of 5000 records for our simulations. ... Sets of size 400 were selected, and 100 samples were taken for each. |