Policy Aggregation
Authors: Parand A. Alamdari, Soroush Ebadian, Ariel D. Procaccia
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our experiments in Section 7 evaluate the policies returned by different rules based on their fairness; the results identify quantile fairness as especially appealing. The experiments also illustrate the advantage of our approach over rules that optimize measures of social welfare (which are sensitive to affine transformations of the rewards). |
| Researcher Affiliation | Academia | Parand A. Alamdari University of Toronto & Vector Institute parand@cs.toronto.edu Soroush Ebadian University of Toronto soroush@cs.toronto.edu Ariel D. Procaccia Harvard University arielpro@seas.harvard.edu |
| Pseudocode | Yes | ALGORITHM 1: Seq. ϵ-Prop. Veto Core [7] ALGORITHM 2: ϵ-Max Quantile Fairness Procedure ALGORITHM 3: α-Approvals MILP ALGORITHM 4: ϵ-Borda count MILP |
| Open Source Code | Yes | The code for the experiments is available at https://github.com/praal/policy-aggregation. |
| Open Datasets | Yes | We adapt the dynamic attention allocation environment introduced by D Amour et al. [11]. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts. It describes an environment and policy sampling for evaluation, but not data partitioning for model training in a supervised learning sense. |
| Hardware Specification | Yes | Experiments are all done on an AMD EPYC 7502 32-Core Processor with 258Gi B system memory. We use Gurobi [18] to solve LPs and MILPs. |
| Software Dependencies | Yes | We use Gurobi [18] to solve LPs and MILPs. |
| Experiment Setup | Yes | We sample 5 * 10^5 random policies based on which we fit a generalized logistic function to estimate the cdf of the expected return distribution Fi (Definition 4) for every agent. The policies for α-approval voting rules are optimized with respect to maximum utilitarian welfare. The egalitarian rule finds a policy that maximizes the expected return of the worst-off agent, then optimizes for the second worst-off agent, and so on. The implementation details of Borda count are in Appendix D. |