Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects
Authors: Santiago Cortes-Gomez, Naveen Janaki Raman, Aarti Singh, Bryan Wilder
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assess our two-stage RCT design with both synthetic and real-world datasets. Synthetic Dataset and Setup We construct a synthetic dataset to evaluate our two-stage RCT designs. We sample arm means, µ, from a uniform 0-1 distribution (we experiment with other choices in Appendix E). We compare our two-stage design against baselines and find that our sample splitting methods improve upon baselines. In Figure 1, we find that our sample splitting methods outperform single-stage methods across first-stage percentages. |
| Researcher Affiliation | Academia | 1Department of Machine Learning, Carnegie Mellon University. Correspondence to: Santiago Cortes Gomez <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Sample splitting design 1: Input: s1 iid samples. 2: Output: Set π(X) 3: Split first stage data randomly into two sets: U = {x1, ..., x s1 2 } and V = {z1, ..., z s1 2 }. |
| Open Source Code | No | 1We include all code and datasets at hidden Explanation: The paper states "We include all code and datasets at hidden", which is a placeholder typically used during double-blind review and does not provide concrete access to the code. |
| Open Datasets | No | We run semi-synthetic experiments where effect sizes are drawn accordingly to a realworld distribution drawn from a meta-analysis of treatments in gerontology (Greising et al., 2009). Explanation: The paper uses a meta-analysis by Greising et al. (2009) as a source for effect sizes to *generate* a semi-synthetic dataset, rather than using the meta-analysis itself as a direct, publicly available dataset for experiments. It also mentions a 'synthetic dataset' which is generated by the authors. No concrete access information (link, DOI, specific repository) is provided for any dataset used in the experiments. |
| Dataset Splits | No | We sample arm means, µ, from a uniform 0-1 distribution (we experiment with other choices in Appendix E). Arms have Bernoulli outcomes with mean µi, which simulates settings where treatment are successful with probability µi. We fix n = 10 (we find similar results for other n in Appendix D) and δ = 0.1 (and find similar results for other δ in Appendix B). Explanation: The paper describes generating synthetic data and semi-synthetic experiments. It does not mention traditional dataset splits like training, validation, or test sets with specific percentages or counts. The budgets s1 and s2 refer to sample allocation across stages, not fixed dataset splits. |
| Hardware Specification | No | Explanation: The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | Explanation: The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | We sample arm means, µ, from a uniform 0-1 distribution (we experiment with other choices in Appendix E). Arms have Bernoulli outcomes with mean µi, which simulates settings where treatment are successful with probability µi. We fix n = 10 (we find similar results for other n in Appendix D) and δ = 0.1 (and find similar results for other δ in Appendix B). We compare the following RCT designs: 1. Random Two-stage top-k design + random k 2. Best Arm Two-stage design with k = 1 3. Single-stage 4. Sample Split Our proposed two-stage method uses the first stage to prune arms and the second stage to compute certificates 5. Omniscient -A two-stage method which computes k with knowledge of µ. |