reproducibilityindex.ai

Position: Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized

Authors: Shomik Jain, Kathleen Creel, Ashia Camage Wilson

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We simulate how much randomization can reduce SER for various distributions of claims, when each decision-maker has a noisy estimation of these claims ( N(0, 2)). As Figure (a) illustrates, we consider the following distributions: Uniform: all claims equally likely Normal: more average claims Inverted Normal: more strong and weak claims Pareto: more weak claims Inverted Pareto: more strong claims For all these distributions and many different selection rates
Researcher Affiliation	Academia	1Institute for Data, Systems, and Society, MIT 2Department of Philosophy & Religion and Khoury College of Computer Sciences, Northeastern University 3Department of Electrical Engineering and Computer Science, MIT.
Pseudocode	Yes	A.3. Pseudocode for Randomization Proposals Algorithm 1 Partial BF Lottery Algorithm 2 Randomization Using Variance Algorithm 3 Randomization Using Outliers
Open Source Code	Yes	We share the code for our randomization methods and experiments at: https://github.com/ shomikj/randomization_for_fairness.
Open Datasets	Yes	We test our randomization proposals on 2 datasets: (1) Swiss Unemployment Data (Lechner et al., 2020), and (2) Census Income Data (Ding et al., 2021).
Dataset Splits	No	The paper states, 'All our experiments involve an 80-20 train-test split (with 5 repetitions),' but it does not explicitly mention a separate validation split or how validation was handled.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not specify any software versions for libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	For our main analysis, we use a selection rate of k/n = 0.25 and explore other selection rates in the Appendix (which yield similar results). ... We find small tradeoffs with utility that are very similar to those for expected utility that we saw for when claims are known and normally distributed (c.f. Figure 1d). For example, we observe just a 0.8% drop in utility for partial randomization with k = 0.5 k and n = k, which randomizes half the available resources across the k closest predictions to the decision-boundary on either side. ... We contend that if any of these models placed an individual among the top k claims, then they should have a chance to receive oi = 1. Specifically, we propose directly assigning oi = 1 to individuals placed in the top k by all models, and then conducting an iterative weighted selection among the remaining individuals, where the weights represent the proportion of models that placed them in the top k.