Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning
Authors: Shuguang Yu, Shuxing Fang, Ruixin Peng, Zhengling Qi, Fan Zhou, Chengchun Shi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the effectiveness of the proposed estimator through theoretical results and numerical experiments. |
| Researcher Affiliation | Academia | Shuguang Yu School of Statistics and Management Shanghai University of Finance and Economics Shanghai, China Shuxing Fang Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong, China Ruixin Peng School of Statistics and Management Shanghai University of Finance and Economics Shanghai, China Zhengling Qi Department of Decision Sciences George Washington University Washington D.C., USA Fan Zhou School of Statistics and Management Shanghai University of Finance and Economics Shanghai, China Chengchun Shi Department of Statistics London School of Economics and Political Science London, UK |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code is available on Github: https://github.com/fsmiu/Two-way Deconfounder. |
| Open Datasets | Yes | The data is subject to an agreement and cannot be shared due to its sensitivity, but it is publicly available at https://physionet.org/content/mimiciii/1.4/. |
| Dataset Splits | Yes | Each dataset undergoes a 75/25 split for training, validation respectively. |
| Hardware Specification | Yes | The Two-way Deconfounder Model described in Section 3 was implemented in Pytorch and trained on an NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The Two-way Deconfounder Model described in Section 3 was implemented in Pytorch... (No version number for Pytorch or other dependencies is provided). |
| Experiment Setup | Yes | The search range for each hyperparameter is described as follows, learning rate lr [0.005, 0.001], Batch size bs [28, 29, 210, 211, 212], weight decay λ [0.01, 0.0001], two-way embedding dimension dtw [21, 22, 23], Loss weighting α [0.0, 0.3, 0.5, 0.7]. |