reproducibilityindex.ai

Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

Authors: Parand A. Alamdari, Toryn Q. Klassen, Elliot Creager, Sheila A. Mcilraith

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we compare different methods for designing the augmented memory via two simulation studies. Experiments show how generating counterfactual memories during RL can improve the overall fairness and sample efficiency of training in dynamic settings with multiple stakeholders.
Researcher Affiliation	Academia	1University of Toronto, Toronto, Canada 2Vector Institute, Toronto, Canada 3Schwartz Reisman Institute for Technology and Society, Toronto, Canada 4University of Waterloo, Waterloo, Canada.
Pseudocode	Yes	Algorithm 1 Tabular Fair QCM
Open Source Code	Yes	The code for all experiments is available at https://github.com/praal/rememberingto-be-fair.
Open Datasets	No	The paper describes two simulated environments ('Resource Allocation' and 'Simulated Lending') which are custom-built for the paper's experiments. They are not publicly available datasets in the typical sense (no links, DOIs, or citations to established datasets). The data is generated by the simulation, not loaded from a pre-existing public source.
Dataset Splits	No	The paper does not explicitly mention train/validation/test splits with percentages or counts for its simulated environments. It mentions 'episode length' and 'phases of training' but not specific data partitioning for different phases.
Hardware Specification	Yes	We ran the experiments on a system with the following specification: 2.3 GHz Quad-Core Intel Core i7 and 32 GB of RAM.
Software Dependencies	No	The paper mentions 'Deep RL framework', 'Deep Q Networks (DQN)', 'GRU (Cho et al., 2014)', but no specific version numbers for libraries like PyTorch, TensorFlow, or scikit-learn are provided.
Experiment Setup	Yes	We set γ = 0.99, α = 0.1, and use epsilon-greedy for exploration. ϵ = 1.0 at the beginning for each state, and every time we visit a state s, ϵs is multiplied by 0.95 (diminishing factor), and remains greater than 0.2. (Appendix A.2. Technical details for Tabular Q-Learning Experiments). Also, Tables 1 and 2 list 'Learning Rate', 'Discount Factor (γ)', 'Min Exploration Rate (ϵ)', 'Replay Buffer Size', 'Batch Size'.