Fairness in Streaming Submodular Maximization over a Matroid Constraint

Authors: Marwa El Halabi, Federico Fusco, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks. In this section, we empirically evaluate the performance of our algorithms on three applications: maximum coverage, movie recommendation, and exemplar-based clustering, with various choices of fairness and matroid constraints. In comparing our algorithms against two baselines, we consider objective values, as well as violations of fairness constraints, which we define for a given set S as err(S) = P c max{|S Vc| uc, ℓc |S Vc|, 0}. Each term in this sum counts the number of elements by which S violates the lower or upper bound. Note that err(S) is in the range [0, 2k]. We compare the following algorithms: TWOPASS-FAIR-STREAMING: using GREEDYFAIR-RESERVOIR in the first pass, and FAIRSTREAMING+ in the second pass with A = MATROIDINTERSECTION, explained below. GREEDY-FAIR-STREAMING: a one-pass heuristic algorithm based on the ideas of our two-pass algorithm (see Appendix A.3 for details). MATROID-INTERSECTION: streaming algorithm for submodular maximization over two matroid constraints (Chakrabarti & Kale, 2015) with I and IC. RANDOM: randomly selects a base set in I; no fairness constraints. We describe below the setup of our experiments. We select fairness bounds ℓc, uc which yield instances with feasible solutions, and enforce either that each color group Vc comprises a similar portion of the solution set S (in examplar-based clustering) or that they have a similar representation in S as in the entire dataset (in maximum coverage and movie recommendation). We report the results in Figure 1, and discuss them in Section 5.4.
Researcher Affiliation Collaboration 1Samsung SAIT AI Lab, Montreal 2Sapienza University of Rome 3Google Zurich 4EPFL 5Microsoft Research.
Pseudocode Yes Algorithm 1 FAIR-RESERVOIR, Algorithm 2 FAIR-STREAMING, Algorithm 3 GREEDY-FAIR-RESERVOIR, Algorithm 4 FAIR-STREAMING+, Algorithm 5 GREEDY-FAIR-STREAMING, Algorithm 6 GREEDY-FAIR-STREAMING-M
Open Source Code Yes The code is available at https://github. com/google-research/google-research/ tree/master/fair_submodular_matroid.
Open Datasets Yes We use the Pokec social network (Leskovec & Krevl, 2014) and We emulate a movie recommendation system using the Movielens 1M dataset (Harper & Konstan, 2016) and We consider a dataset containing 4,521 records of phone calls in a marketing campaign ran by a Portuguese banking institution (Moro et al., 2014).
Dataset Splits No The paper describes the partitioning of data for fairness and matroid constraints (e.g., “We partition users into four standard BMI categories”, “We partition the movies into 18 genres c”), but it does not provide details on traditional train/validation/test dataset splits (e.g., 80/10/10 percentages or specific sample counts) for reproducibility of model training in the typical supervised learning sense. The problem is streaming, so elements arrive sequentially, and the evaluation is on the full dataset with constraints.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as Python versions or library versions.
Experiment Setup Yes We describe below the setup of our experiments. We select fairness bounds ℓc, uc which yield instances with feasible solutions, and enforce either that each color group Vc comprises a similar portion of the solution set S (in examplar-based clustering) or that they have a similar representation in S as in the entire dataset (in maximum coverage and movie recommendation). We vary k between 10 and 200. We set ℓc = j 0.9 |Vc| /|V | k k and uc = l 1.5 |Vc| /|V | k m. We set our fairness bounds as ℓc = 0.1k + 2 and uc = 0.4k for each color 1 c 6. The parameter α = 0.85 interpolates between optimizing the coverage of the entire movie collection and selecting movies that maximize the user s score.