Position: Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized
Authors: Shomik Jain, Kathleen Creel, Ashia Camage Wilson
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We simulate how much randomization can reduce SER for various distributions of claims, when each decision-maker has a noisy estimation of these claims ( N(0, 2)). As Figure (a) illustrates, we consider the following distributions: Uniform: all claims equally likely Normal: more average claims Inverted Normal: more strong and weak claims Pareto: more weak claims Inverted Pareto: more strong claims For all these distributions and many different selection rates |
| Researcher Affiliation | Academia | 1Institute for Data, Systems, and Society, MIT 2Department of Philosophy & Religion and Khoury College of Computer Sciences, Northeastern University 3Department of Electrical Engineering and Computer Science, MIT. |
| Pseudocode | Yes | A.3. Pseudocode for Randomization Proposals Algorithm 1 Partial BF Lottery Algorithm 2 Randomization Using Variance Algorithm 3 Randomization Using Outliers |
| Open Source Code | Yes | We share the code for our randomization methods and experiments at: https://github.com/ shomikj/randomization_for_fairness. |
| Open Datasets | Yes | We test our randomization proposals on 2 datasets: (1) Swiss Unemployment Data (Lechner et al., 2020), and (2) Census Income Data (Ding et al., 2021). |
| Dataset Splits | No | The paper states, 'All our experiments involve an 80-20 train-test split (with 5 repetitions),' but it does not explicitly mention a separate validation split or how validation was handled. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not specify any software versions for libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | For our main analysis, we use a selection rate of k/n = 0.25 and explore other selection rates in the Appendix (which yield similar results). ... We find small tradeoffs with utility that are very similar to those for expected utility that we saw for when claims are known and normally distributed (c.f. Figure 1d). For example, we observe just a 0.8% drop in utility for partial randomization with k = 0.5 k and n = k, which randomizes half the available resources across the k closest predictions to the decision-boundary on either side. ... We contend that if any of these models placed an individual among the top k claims, then they should have a chance to receive oi = 1. Specifically, we propose directly assigning oi = 1 to individuals placed in the top k by all models, and then conducting an iterative weighted selection among the remaining individuals, where the weights represent the proportion of models that placed them in the top k. |