Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Policy Evaluation Across Multiple Different Experimental Datasets

Authors: Yonghan Jung, Alexis Bellot

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically verified the robustness of estimators through simulations. In this section, we demonstrate the proposed estimators in Defs. (5,7) for combining multiple experimental datasets from different domains. We first compared the estimators on synthetic data to provide evidence of the fast convergence and doubly robustness behaviours of the proposed estimators. We conclude with an analysis of the ACTG 175 clinical trial [21] and Project STAR.
Researcher Affiliation Collaboration Yonghan Jung Purdue University EMAIL Alexis Bellot Independent Researcher EMAIL :Now at Google Deep Mind.
Pseudocode Yes Definition 5 (DML for combining two experiments). Let D2 P 2 Ο€2p Vq, D1 P 1 Ο€1p Vq and D0 P 0p Cq. Let L Δ› 2 denote a fixed number. 1. Sample split: For β„“ 1, , L, randomly split Di for i P t0, 1, 2u into L-fold. The β„“ th partition of the sample is denoted Di β„“. The complement is Di β„“: Diz Di β„“. 2. Nuisance estimation: For each β„“ 1, , L, learn the estimator model Λ†Β΅2 β„“and Λ†Β΅1 β„“for Β΅2 0, Β΅1 0 using samples D2 β„“, D1 β„“, respectively. Also, learn the estimation model for Λ†Ο‰1 β„“, Λ†Ο‰2 β„“for Ο‰1 0, Ο‰2 0 using samples Di β„“for i 0, 1, 2, respectively. 3. Evaluation: The DML estimator Λ†Οˆ for EP 0 Ο€0r Y s is then given as
Open Source Code Yes Codes corresponding to simulations are submitted as supplementary materials. [NeurIPS Checklist Q5 Justification]: The code will not be open sourced at this moment but we believe to have provided sufficient details to reproduce our results.
Open Datasets Yes We conclude with an analysis of the ACTG 175 clinical trial [21] and Project STAR. The dataset is publicly accessible from the R data repository: https://search.r-project.org/CRAN/refmans/AER/html/STAR.html.
Dataset Splits Yes Definition 5 (DML for combining two experiments). 1. Sample split: For β„“ 1, , L, randomly split Di for i P t0, 1, 2u into L-fold. The β„“ th partition of the sample is denoted Di β„“. The complement is Di β„“: Diz Di β„“.
Hardware Specification No The paper does not provide specific details about the hardware used, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions using "XGBoost [12] to estimate nuisances" but does not specify a version number for XGBoost or any other software dependencies.
Experiment Setup Yes We ran 100 simulations for each N t2500, 5000, 10000, 20000u where N is the sample size. To enforce the convergence rate of nuisance estimates no faster than the decaying rate n 1{4, we add Ο΅ to all nuisance estimates. This scenario is inspired by the experimental design discussed in [27]. The AE plots for combining two/multiple experiments are presented in Figs. (3a, 3b).