Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
An Online Sequential Test for Qualitative Treatment Effects
Authors: Chengchun Shi, Shikai Luo, Hongtu Zhu, Rui Song
JMLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical studies are conducted to examine the finite sample performance of our test procedure. (Abstract); In this section, we conduct Monte Carlo simulations to examine the finite sample properties of the proposed test. (4.1.1); In this section, we apply the proposed method to a Yahoo! Today Module user click log dataset (4.2). |
| Researcher Affiliation | Collaboration | Chengchun Shi EMAIL Department of Statistics, London School of Economics and Political Science, Shikai Luo EMAIL Tecent PCG, Hongtu Zhu EMAIL Department of Biostatistics, University of North-Carolina, Rui Song EMAIL Department of Statistics, North-Carolina State University |
| Pseudocode | Yes | Algorithm 1: the Pseudocode that summarizing the online bootstrap testing procedure. |
| Open Source Code | No | No explicit statement or link to the source code for the methodology described in this paper is provided. |
| Open Datasets | Yes | In this section, we apply the proposed method to a Yahoo! Today Module user click log dataset1, which contains 45,811,883 user visits to the Today Module, during the first ten days in May 2009. ... 1. https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&did=49 |
| Dataset Splits | No | The paper describes a sequential online testing procedure and how users are dynamically assigned to different arms in A/B experiments, rather than providing fixed training/test/validation dataset splits for model training and evaluation. |
| Hardware Specification | Yes | We run our experiments on a single computer instance with 40 Intel(R) Xeon(R) 2.20GHz CPUs. It takes 1-2 seconds on average to compute each test. |
| Software Dependencies | No | The paper does not explicitly mention specific software dependencies with version numbers for reproducibility. |
| Experiment Setup | Yes | We generated the potential outcomes as Y i (a) = 1 + (Xi1 Xi2)/2 + aτ(Xi) + εi, where εi s are i.i.d N(0, 0.52). The covariates Xi = (Xi1, Xi2, Xi3) were generated as follows... We consider two randomization designs... In addition, we set N(T1) = 2000 and N(Tj) N(Tj 1) = 2n for 2 j K and some n > 0. We consider two combinations of (n, K), corresponding to (n, K) = (200, 5) and (20, 50). We set the significance level α = 0.05 and choose B = 10000. We set τ(Xi) = φδ{(Xi1 + Xi2)/ 2}X2 i3 for some function φδ parameterized by some δ 0... For all settings, we construct the basis function ϕ( ) using additive cubic splines. For each univariate spline, we set the number of internal knots to be 4. These knots are equally spaced between [ 2, 2]. |