Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making
Authors: Parand A. Alamdari, Toryn Q. Klassen, Elliot Creager, Sheila A. Mcilraith
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we compare different methods for designing the augmented memory via two simulation studies. Experiments show how generating counterfactual memories during RL can improve the overall fairness and sample efficiency of training in dynamic settings with multiple stakeholders. |
| Researcher Affiliation | Academia | 1University of Toronto, Toronto, Canada 2Vector Institute, Toronto, Canada 3Schwartz Reisman Institute for Technology and Society, Toronto, Canada 4University of Waterloo, Waterloo, Canada. |
| Pseudocode | Yes | Algorithm 1 Tabular Fair QCM |
| Open Source Code | Yes | The code for all experiments is available at https://github.com/praal/rememberingto-be-fair. |
| Open Datasets | No | The paper describes two simulated environments ('Resource Allocation' and 'Simulated Lending') which are custom-built for the paper's experiments. They are not publicly available datasets in the typical sense (no links, DOIs, or citations to established datasets). The data is generated by the simulation, not loaded from a pre-existing public source. |
| Dataset Splits | No | The paper does not explicitly mention train/validation/test splits with percentages or counts for its simulated environments. It mentions 'episode length' and 'phases of training' but not specific data partitioning for different phases. |
| Hardware Specification | Yes | We ran the experiments on a system with the following specification: 2.3 GHz Quad-Core Intel Core i7 and 32 GB of RAM. |
| Software Dependencies | No | The paper mentions 'Deep RL framework', 'Deep Q Networks (DQN)', 'GRU (Cho et al., 2014)', but no specific version numbers for libraries like PyTorch, TensorFlow, or scikit-learn are provided. |
| Experiment Setup | Yes | We set γ = 0.99, α = 0.1, and use epsilon-greedy for exploration. ϵ = 1.0 at the beginning for each state, and every time we visit a state s, ϵs is multiplied by 0.95 (diminishing factor), and remains greater than 0.2. (Appendix A.2. Technical details for Tabular Q-Learning Experiments). Also, Tables 1 and 2 list 'Learning Rate', 'Discount Factor (γ)', 'Min Exploration Rate (ϵ)', 'Replay Buffer Size', 'Batch Size'. |