Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Online Sequential Test for Qualitative Treatment Effects

Authors: Chengchun Shi, Shikai Luo, Hongtu Zhu, Rui Song

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies are conducted to examine the finite sample performance of our test procedure. (Abstract); In this section, we conduct Monte Carlo simulations to examine the finite sample properties of the proposed test. (4.1.1); In this section, we apply the proposed method to a Yahoo! Today Module user click log dataset (4.2).
Researcher Affiliation Collaboration Chengchun Shi EMAIL Department of Statistics, London School of Economics and Political Science, Shikai Luo EMAIL Tecent PCG, Hongtu Zhu EMAIL Department of Biostatistics, University of North-Carolina, Rui Song EMAIL Department of Statistics, North-Carolina State University
Pseudocode Yes Algorithm 1: the Pseudocode that summarizing the online bootstrap testing procedure.
Open Source Code No No explicit statement or link to the source code for the methodology described in this paper is provided.
Open Datasets Yes In this section, we apply the proposed method to a Yahoo! Today Module user click log dataset1, which contains 45,811,883 user visits to the Today Module, during the first ten days in May 2009. ... 1. https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&did=49
Dataset Splits No The paper describes a sequential online testing procedure and how users are dynamically assigned to different arms in A/B experiments, rather than providing fixed training/test/validation dataset splits for model training and evaluation.
Hardware Specification Yes We run our experiments on a single computer instance with 40 Intel(R) Xeon(R) 2.20GHz CPUs. It takes 1-2 seconds on average to compute each test.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers for reproducibility.
Experiment Setup Yes We generated the potential outcomes as Y i (a) = 1 + (Xi1 Xi2)/2 + aτ(Xi) + εi, where εi s are i.i.d N(0, 0.52). The covariates Xi = (Xi1, Xi2, Xi3) were generated as follows... We consider two randomization designs... In addition, we set N(T1) = 2000 and N(Tj) N(Tj 1) = 2n for 2 j K and some n > 0. We consider two combinations of (n, K), corresponding to (n, K) = (200, 5) and (20, 50). We set the significance level α = 0.05 and choose B = 10000. We set τ(Xi) = φδ{(Xi1 + Xi2)/ 2}X2 i3 for some function φδ parameterized by some δ 0... For all settings, we construct the basis function ϕ( ) using additive cubic splines. For each univariate spline, we set the number of internal knots to be 4. These knots are equally spaced between [ 2, 2].