Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Efficient Policy Evaluation Across Multiple Different Experimental Datasets
Authors: Yonghan Jung, Alexis Bellot
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verified the robustness of estimators through simulations. In this section, we demonstrate the proposed estimators in Defs. (5,7) for combining multiple experimental datasets from different domains. We first compared the estimators on synthetic data to provide evidence of the fast convergence and doubly robustness behaviours of the proposed estimators. We conclude with an analysis of the ACTG 175 clinical trial [21] and Project STAR. |
| Researcher Affiliation | Collaboration | Yonghan Jung Purdue University EMAIL Alexis Bellot Independent Researcher EMAIL :Now at Google Deep Mind. |
| Pseudocode | Yes | Definition 5 (DML for combining two experiments). Let D2 P 2 Ο2p Vq, D1 P 1 Ο1p Vq and D0 P 0p Cq. Let L Δ 2 denote a fixed number. 1. Sample split: For β 1, , L, randomly split Di for i P t0, 1, 2u into L-fold. The β th partition of the sample is denoted Di β. The complement is Di β: Diz Di β. 2. Nuisance estimation: For each β 1, , L, learn the estimator model ΛΒ΅2 βand ΛΒ΅1 βfor Β΅2 0, Β΅1 0 using samples D2 β, D1 β, respectively. Also, learn the estimation model for ΛΟ1 β, ΛΟ2 βfor Ο1 0, Ο2 0 using samples Di βfor i 0, 1, 2, respectively. 3. Evaluation: The DML estimator ΛΟ for EP 0 Ο0r Y s is then given as |
| Open Source Code | Yes | Codes corresponding to simulations are submitted as supplementary materials. [NeurIPS Checklist Q5 Justification]: The code will not be open sourced at this moment but we believe to have provided sufficient details to reproduce our results. |
| Open Datasets | Yes | We conclude with an analysis of the ACTG 175 clinical trial [21] and Project STAR. The dataset is publicly accessible from the R data repository: https://search.r-project.org/CRAN/refmans/AER/html/STAR.html. |
| Dataset Splits | Yes | Definition 5 (DML for combining two experiments). 1. Sample split: For β 1, , L, randomly split Di for i P t0, 1, 2u into L-fold. The β th partition of the sample is denoted Di β. The complement is Di β: Diz Di β. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using "XGBoost [12] to estimate nuisances" but does not specify a version number for XGBoost or any other software dependencies. |
| Experiment Setup | Yes | We ran 100 simulations for each N t2500, 5000, 10000, 20000u where N is the sample size. To enforce the convergence rate of nuisance estimates no faster than the decaying rate n 1{4, we add Ο΅ to all nuisance estimates. This scenario is inspired by the experimental design discussed in [27]. The AE plots for combining two/multiple experiments are presented in Figs. (3a, 3b). |