Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

Authors: Parand A. Alamdari, Toryn Q. Klassen, Elliot Creager, Sheila A. Mcilraith

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we compare different methods for designing the augmented memory via two simulation studies. Experiments show how generating counterfactual memories during RL can improve the overall fairness and sample efficiency of training in dynamic settings with multiple stakeholders.
Researcher Affiliation Academia 1University of Toronto, Toronto, Canada 2Vector Institute, Toronto, Canada 3Schwartz Reisman Institute for Technology and Society, Toronto, Canada 4University of Waterloo, Waterloo, Canada.
Pseudocode Yes Algorithm 1 Tabular Fair QCM
Open Source Code Yes The code for all experiments is available at https://github.com/praal/rememberingto-be-fair.
Open Datasets No The paper describes two simulated environments ('Resource Allocation' and 'Simulated Lending') which are custom-built for the paper's experiments. They are not publicly available datasets in the typical sense (no links, DOIs, or citations to established datasets). The data is generated by the simulation, not loaded from a pre-existing public source.
Dataset Splits No The paper does not explicitly mention train/validation/test splits with percentages or counts for its simulated environments. It mentions 'episode length' and 'phases of training' but not specific data partitioning for different phases.
Hardware Specification Yes We ran the experiments on a system with the following specification: 2.3 GHz Quad-Core Intel Core i7 and 32 GB of RAM.
Software Dependencies No The paper mentions 'Deep RL framework', 'Deep Q Networks (DQN)', 'GRU (Cho et al., 2014)', but no specific version numbers for libraries like PyTorch, TensorFlow, or scikit-learn are provided.
Experiment Setup Yes We set γ = 0.99, α = 0.1, and use epsilon-greedy for exploration. ϵ = 1.0 at the beginning for each state, and every time we visit a state s, ϵs is multiplied by 0.95 (diminishing factor), and remains greater than 0.2. (Appendix A.2. Technical details for Tabular Q-Learning Experiments). Also, Tables 1 and 2 list 'Learning Rate', 'Discount Factor (γ)', 'Min Exploration Rate (ϵ)', 'Replay Buffer Size', 'Batch Size'.