Policy Aggregation

Authors: Parand A. Alamdari, Soroush Ebadian, Ariel D. Procaccia

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, our experiments in Section 7 evaluate the policies returned by different rules based on their fairness; the results identify quantile fairness as especially appealing. The experiments also illustrate the advantage of our approach over rules that optimize measures of social welfare (which are sensitive to affine transformations of the rewards).
Researcher Affiliation Academia Parand A. Alamdari University of Toronto & Vector Institute parand@cs.toronto.edu Soroush Ebadian University of Toronto soroush@cs.toronto.edu Ariel D. Procaccia Harvard University arielpro@seas.harvard.edu
Pseudocode Yes ALGORITHM 1: Seq. ϵ-Prop. Veto Core [7] ALGORITHM 2: ϵ-Max Quantile Fairness Procedure ALGORITHM 3: α-Approvals MILP ALGORITHM 4: ϵ-Borda count MILP
Open Source Code Yes The code for the experiments is available at https://github.com/praal/policy-aggregation.
Open Datasets Yes We adapt the dynamic attention allocation environment introduced by D Amour et al. [11].
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts. It describes an environment and policy sampling for evaluation, but not data partitioning for model training in a supervised learning sense.
Hardware Specification Yes Experiments are all done on an AMD EPYC 7502 32-Core Processor with 258Gi B system memory. We use Gurobi [18] to solve LPs and MILPs.
Software Dependencies Yes We use Gurobi [18] to solve LPs and MILPs.
Experiment Setup Yes We sample 5 * 10^5 random policies based on which we fit a generalized logistic function to estimate the cdf of the expected return distribution Fi (Definition 4) for every agent. The policies for α-approval voting rules are optimized with respect to maximum utilitarian welfare. The egalitarian rule finds a policy that maximizes the expected return of the worst-off agent, then optimizes for the second worst-off agent, and so on. The implementation details of Borda count are in Appendix D.