Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Reinforcement Learning Framework for Dynamic Mediation Analysis
Authors: Lin Ge, Jitao Wang, Chengchun Shi, Zhenke Wu, Rui Song
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The superior performance of the proposed method is demonstrated through extensive numerical studies, theoretical results, and an analysis of a mobile health dataset. |
| Researcher Affiliation | Academia | 1North Carolina State University 2University of Michigan, Ann Arbor 3London School of Economics and Political Science. |
| Pseudocode | No | No pseudocode or algorithm blocks were explicitly labeled or presented in the paper. |
| Open Source Code | Yes | A Python implementation of the proposed procedure is available at https://github.com/linlinlin97/Mediation RL. |
| Open Datasets | Yes | In this section, we apply the proposed MR estimators to analyze the real dataset from the IHS (Ne Camp et al., 2020) |
| Dataset Splits | Yes | It is worth noting that we used cross-validation to estimate the ATE of πopt. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications, or cluster configurations) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions 'A Python implementation' but does not specify version numbers for Python or any other software dependencies, libraries, or solvers used for the experiments. |
| Experiment Setup | Yes | We consider a scenario with discrete states, actions, mediators, and rewards. We set time T = 50, and S0 for each trajectory is sampled from a Bernoulli distribution with a mean probability of 0.5. Denote the sigmoid function as expit( ). Following the behavior policy, the action At {0, 1} is sampled from a Bernoulli distribution, where Pr(At = 1|St) = expit(1.0 2.0St). Observing St and At, the mediator Mt {0, 1} is drawn from a Bernoulli distribution with Pr(Mt = 1|St, At) = expit(1.0 1.5St +2.5At). |