Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fair and Diverse DPP-Based Data Summarization
Authors: Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, Nisheeth Vishnoi
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on both real-world and synthetic datasets show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case. |
| Researcher Affiliation | Collaboration | 1 Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland 2Microsoft Research, India 3UC Berkeley. |
| Pseudocode | Yes | Algorithm 1 Sample-And-Project |
| Open Source Code | No | The paper does not provide a link or explicit statement about the availability of the source code for the described methodology. |
| Open Datasets | Yes | We gathered a collection of images curated using Google image search as follows: Four search terms were used: (a) Scientist Male , (b) Scientist Female , (c) Painter Male , and (d) Painter Female (Imagedataset). ... The Adult income dataset (Blake & Merz, 1998) consists of roughly 45000 records of subjects... Data downloaded from https://archive.ics.uci. edu/ml/datasets/adult. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test splits for the datasets used in the experiments. It mentions sampling 'k samples' or 'sets of size 400' but not specific dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions 'vlfeat toolbox' and 'k-means algorithm' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We sample 40 images from each biased dataset... We conduct 200 repetitions. ... In preprocessing the data we ο¬lter out incomplete entries, and from the remaining ones we pick a random subset of 5000 records for our simulations. ... Sets of size 400 were selected, and 100 samples were taken for each. |