Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Causal Lens for Learning Long-term Fair Policies

Authors: Jacob Lear, Lu Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments to evaluate the policy optimization algorithms we have proposed and compare them with baselines regarding the achievement of long-term fairness.
Researcher Affiliation	Academia	Jacob Lear & Lu Zhang Department of Electrical Engineering and Computer Science University of Arkansas EMAIL
Pseudocode	No	The paper describes methodologies and policy optimizations (PPO, PPO-C, PPO-Cb) in paragraph form and mathematical equations (e.g., Section 3.3 Policy Optimization and Section 3.4 Causal Decomposition of Cπ(θ)), but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	1All the code is available at https://github.com/j-proj/Causal-Lens-Fair-RL.
Open Datasets	Yes	We leverage the simulation environment developed in D Amour et al. (2020) that is commonly used in related work (e.g., Yu et al. (2022); Hu et al. (2023)). ... Setting 1 uses probabilities generated using the Home Credit Default Risk dataset Montoya et al. (2018), and the probabilities for Setting 2 are from a dataset previously released by Lending Club Wagh (2017).
Dataset Splits	No	The paper describes using repayment probabilities generated by fitting a logistic regression model to credit score datasets (Home Credit Default Risk dataset Montoya et al. (2018) and Lending Club Wagh (2017)). However, it does not provide specific details on training/test/validation splits used for these datasets or for the overall experimental setup.
Hardware Specification	No	The paper describes a simulation environment for experiments but does not provide any specific details about the hardware used to run these simulations or train the models.
Software Dependencies	No	The paper refers to policy optimization algorithms like PPO but does not specify any particular software libraries, programming languages, or version numbers used for its implementation or experiments.
Experiment Setup	Yes	Our choice for policy optimization is mostly typical as a variant of Proximal Policy Optimization (PPO) that incorporates the KL divergence as a penalty Schulman et al. (2017). ... The final objective function is obtained by incorporating Λ into Eq. (2): J(θ) = LUT IL βKLLKL βC( ˆCπ)2 βΛΛ. ... In Figure 7, a larger βKL can help reduce the model variance. ... In Figure 8, we show the influence of the strength of enforcing benefit fairness on long-term fairness. Notably, we observe that the loan rate for βΛ = 2 is more balanced than that for βΛ = 0.