Position: Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized

Authors: Shomik Jain, Kathleen Creel, Ashia Camage Wilson

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We simulate how much randomization can reduce SER for various distributions of claims, when each decision-maker has a noisy estimation of these claims ( N(0, 2)). As Figure (a) illustrates, we consider the following distributions: Uniform: all claims equally likely Normal: more average claims Inverted Normal: more strong and weak claims Pareto: more weak claims Inverted Pareto: more strong claims For all these distributions and many different selection rates
Researcher Affiliation Academia 1Institute for Data, Systems, and Society, MIT 2Department of Philosophy & Religion and Khoury College of Computer Sciences, Northeastern University 3Department of Electrical Engineering and Computer Science, MIT.
Pseudocode Yes A.3. Pseudocode for Randomization Proposals Algorithm 1 Partial BF Lottery Algorithm 2 Randomization Using Variance Algorithm 3 Randomization Using Outliers
Open Source Code Yes We share the code for our randomization methods and experiments at: https://github.com/ shomikj/randomization_for_fairness.
Open Datasets Yes We test our randomization proposals on 2 datasets: (1) Swiss Unemployment Data (Lechner et al., 2020), and (2) Census Income Data (Ding et al., 2021).
Dataset Splits No The paper states, 'All our experiments involve an 80-20 train-test split (with 5 repetitions),' but it does not explicitly mention a separate validation split or how validation was handled.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper does not specify any software versions for libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes For our main analysis, we use a selection rate of k/n = 0.25 and explore other selection rates in the Appendix (which yield similar results). ... We find small tradeoffs with utility that are very similar to those for expected utility that we saw for when claims are known and normally distributed (c.f. Figure 1d). For example, we observe just a 0.8% drop in utility for partial randomization with k = 0.5 k and n = k, which randomizes half the available resources across the k closest predictions to the decision-boundary on either side. ... We contend that if any of these models placed an individual among the top k claims, then they should have a chance to receive oi = 1. Specifically, we propose directly assigning oi = 1 to individuals placed in the top k by all models, and then conducting an iterative weighted selection among the remaining individuals, where the weights represent the proportion of models that placed them in the top k.