reproducibilityindex.ai

Stability and Multigroup Fairness in Ranking with Uncertain Predictions

Authors: Siddartha Devic, Aleksandra Korolova, David Kempe, Vatsal Sharan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run experiments to complement our theoretical results and further investigate the fairness-utility tradeoff. Our results demonstrate that r UA is far more stable than rτ opt in practice, and also achieves higher utility than two baseline ranking functions. We use the US Census data set ACS (Ding et al., 2021) and the student dropout task Enrollment (Martins et al., 2021). Table 1 shows the stability of r UA and rτ opt to noise introduced by neural networks trained with SGD, averaged over multiple runs. In Table 2, we report the utility of r UA, the uniform ranking runif assigning each individual to each rank with equal probability, and Plackett Luce rankings r PL (Plackett, 1975; Luce, 1959).
Researcher Affiliation	Academia	1Department of Computer Science, University of Southern California 2Department of Computer Science and Public Affairs, Princeton University.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks with explicit labels like 'Algorithm' or 'Pseudocode'.
Open Source Code	No	The paper does not provide any concrete access information (e.g., a specific repository link or an explicit statement of code release) for the methodology described.
Open Datasets	Yes	We use the US Census data set ACS (Ding et al., 2021) and the student dropout task Enrollment (Martins et al., 2021).
Dataset Splits	No	For computational reasons, we restrict our experiments to a subset of the data for California with parameters survey year= 2018 , horizon= 1-Year and survey= person . These parameters are standard when using ACS for testing algorithmic fairness methods, due to the large amount of available data. (See, e.g., the Git Hub repository of Ding et al. (2021).) We are left with 378,817 entries, and use an 80/20 train/test split. In Enrollment, the target is a multiclass variable for whether an individual is an enrolled, graduated, or dropout student. After cleaning the data, we are left with 4,424 entries, on which we use an 80/20 train/test split. The paper specifies train/test splits but does not explicitly mention a validation split.
Hardware Specification	No	The paper does not provide specific hardware details (such as GPU/CPU models or memory) used for running its experiments.
Software Dependencies	No	The paper mentions training 'three-layer MLP neural networks' with 'SGD' but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We train 30 simple three-layer MLP neural networks on the ACS data set, which we divide into 15 pairs of networks. Each pair of networks is initialized with the same (random) weight matrix, then trained separately with SGD.