Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fairness in Social Influence Maximization via Optimal Transport

Authors: Shubham Chowdhary, Giulia De Pasquale, Nicolas Lanzetti, Ana-Andreea Stoica, Florian Dorfler

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a new seed-selection algorithm that optimizes both outreach and mutual fairness, and we show its efficacy on several real datasets. We find that our algorithm increases fairness with only a minor decrease (and at times, even an increase) in efficiency.
Researcher Affiliation Academia Shubham Chowdhary ETH Zürich EMAIL Giulia De Pasquale Eindhoven University of Technology EMAIL Nicolas Lanzetti ETH Zürich EMAIL Ana-Andreea Stoica Max Planck Institute, Tübingen EMAIL Florian Dörfler ETH Zürich EMAIL
Pseudocode Yes Algorithm 1 Stochastic Seedset Selection Descent Input: Social Graph G(VG, EG), initial seed set S0, β fairness weight, ϵ-tolerance Output: Optimal seedset S
Open Source Code Yes The code for all our numerical experiments is available at https://github.com/nicolaslanzetti/fairness-sim-ot.
Open Datasets Yes We now investigate the use of our newly defined fairness metric across a variety of real-world datasets: Add Health (AH), Antelope Valley variants 0 to 23 (AV_{0-23}) [26], APS Physics (APS) [13], Deezer (DZ) [19], High School Gender (HS) [14], Indian Villages (IV) [1], and Instagram (INS) [22]. Each dataset contains a social network with a chosen demographic partitioning the population into two groups (see Appendix C for details).
Dataset Splits No The paper discusses performing R = 1,000 simulations for Monte Carlo sampling of diffusion processes but does not specify train/validation/test splits for the datasets in the traditional machine learning sense.
Hardware Specification Yes All experiments were performed on a local PC on a single CPU core 3.5 GHz. Except for datasets DZ, INS, all datasets were loaded and operated on a local PC with 32 GB of RAM. For the largest datasets (DZ, INS), we used remote compute clusters with 64 GB memory and similar CPU capabilities.
Software Dependencies Yes For the code development, we broadly used Python 3.10+, numpy, jupyter, and networkx [8].
Experiment Setup Yes We keep R = 1, 000 throughout, but explore several values in p, |S| (mentioned per experiment in the figures below) and exhaustively recorded with other hyperparameters in Appendix D. All details related to computational resources and development environment are available in Appendix G. The code for all our numerical experiments is available at https://github.com/nicolaslanzetti/fairness-sim-ot. ... For S3D (refer to Appendix E), we use constants, exploit_to_explore = 1.3, non_acceptance_retention_prob = 0.95, and shallow_horizon = 4.