Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Strategic Hypothesis Testing

Authors: Yatong Chen, Safwan Hossain, Yiling Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our model and these insights using publicly available data on drug approvals. Overall, our work offers a comprehensive perspective on strategic interactions within the hypothesis testing framework, providing technical and regulatory insights.
Researcher Affiliation	Academia	Yatong Chen Max Planck Institute for Intelligent Systems Tübingen AI Center EMAIL Safwan Hossain Harvard University EMAIL Yiling Chen Harvard University EMAIL
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present any structured, code-like procedural steps.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Code for the experiments are in the supplemental materials.
Open Datasets	Yes	We empirically validate our model and these insights using publicly available data on drug approvals. ... We empirically validate our model and these insights using publicly available data on drug approvals. ... using public sources to capture relevant metrics for three classes of drugs: oncology, vaccines, and cardiovascular. Pro Relix [27] mentions the per-participant expense (c) of vaccine testing to be $50, 000, while oncology and cardiovascular trials being around $128, 000 and $136, 000 respectively. The fixed expenditure (c0) for clinical trials, regulatory approvals, and R&D, as well as the lifetime revenue (R) vary widely, even within drug categories. Prasad and Mailankody [28] highlights the median fixed expenditure to be around $650 million for oncology drugs... Analysis from Bhatt et al. [16] gives ranges of $74 $183 million for fixed costs of Cardiovascular drugs... Rashid and Chandel [31] suggests the corresponding revenue of approved drugs here... For vaccines, Sertkaya et al. [32] outlines revenues between 6.9 billion to 36.9 billion for blockbusters...
Dataset Splits	No	The paper describes using 'publicly available data on drug approvals' but does not specify any training, validation, or test splits. The empirical validation is a case study using real-world data points (costs, revenues) for calculations, rather than a machine learning experiment involving data partitioning.
Hardware Specification	Yes	The experiments were conducted using a Mac Book with only CPU resources and the Num Py package [33].
Software Dependencies	No	The paper mentions using 'the Num Py package [33]' but does not provide a specific version number for NumPy or Python, which is required for a reproducible description of software dependencies.
Experiment Setup	Yes	Due to the variability of fixed costs c0 and revenue R as compared to per-sample-cost c, we fix c and plot bα for different revenue and fixed costs. These ranges are centered around the median values (where available) gathered in the table. We plot the figures for oncology and cardiovascular categories above (see Figure 2 and 3); the vaccines figure is in Appendix E)4. Each plot also displays the α = 0.05 boundary, a commonly used p-value threshold by the FDA [34].