Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Policy Evaluation with Latent Confounders via Optimal Balance
Authors: Andrew Bennett, Nathan Kallus
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide empirical evidence demonstrating our method s consistent evaluation compared to standard evaluation methods and its improved performance compared to using ο¬tted latent outcome models. |
| Researcher Affiliation | Collaboration | Andrew Bennett Cornell University EMAIL Nathan Kallus Cornell University EMAIL [...] This research was funded in part by JPMorgan Chase & Co. |
| Pseudocode | Yes | Algorithm 1 Optimal Kernel Balancing |
| Open Source Code | Yes | Code available online at https://github.com/Causal ML/Latent Confounder Balancing. |
| Open Datasets | No | The paper uses synthetically generated data for its experiments as described in Section 5.1 and Appendix B.1, rather than a publicly available or open dataset that would require access information. Thus, no public dataset is used for training. |
| Dataset Splits | No | The paper describes experiments conducted on synthetically generated data, specifying sample sizes (n P t200, 500, 1000, 2000u) and averaging over 64 runs. However, it does not specify traditional train/validation/test splits for a fixed dataset, as the data is generated for each run. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run its experiments, such as CPU or GPU models, memory, or cloud instance types. |
| Software Dependencies | Yes | We used Python 3.6. [...] Training of the Β΅ functions for the Direct methods was done using the Adam optimizer [25] in PyTorch [34] (version 1.0.0). |
| Experiment Setup | Yes | In our experiments Z is 1-dimensional, X is 10-dimensional, and we have two possible treatment levels (m 2). We experiment with a parametric policy and multiple link functions g as follows: tp Xq expp T t Xq expp T 1 Xq expp T step: gpwq 3 tw 0u 6 exp: gpwq exppwq cubic: gpwq w3 linear: gpwq w [...] The Gaussian kernel bandwidth was set to 0.1 for all experiments. Finally we detail all choices for scenario parameters in Appendix B.1, and provide implementation details of our methods in Appendix B.2. |