Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hedging and Approximate Truthfulness in Traditional Forecasting Competitions

Authors: Mary Monroe, Anish Thilagar, Melody Hsu, Rafael Frongillo

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper gives the first such analysis. We first demonstrate that the long-run truthfulness folklore is false: even for arbitrary numbers of events, the best forecaster can have an incentive to hedge, reporting more moderate beliefs to increase their win probability. On the positive side, however, we show that two contestants will be approximately truthful when they have sufficient uncertainty over the relative quality of their opponent and the outcomes of the events, a case which may arise in practice. The formal proof is in the extended version of the paper. Our proof is geometric, leveraging the fact that a forecaster wins when their report is the closest to the outcome in Euclidean distance, as discussed in 2.1. Given any ϵℓ2 approximately truthful reports r for all forecasters, we show that hedging to r is a strictly dominant strategy for i. Specifically, given any vertex y, we show that either r is closer to y than any rj for j = i, or r is closer to y than ri is to y. See Figure 3. The implication is that r wins on a strict superset of outcomes that ri wins. As shorthand, let us define the following distances, all from a report vector to some vertex y: dp(y) = p y 2, d (y) = r y 2, di(y) = ri y 2, dj(y) = rj y 2.
Researcher Affiliation	Academia	Mary Monroe, Anish Thilagar, Melody Hsu, Rafael Frongillo University of Colorado Boulder
Pseudocode	No	The paper describes mathematical models, definitions, and theorems, such as Definition 1 (Strategy), Definition 2 (Simple Max), Definition 3 (Approximately truthful), Theorem 1, and Theorem 2. It does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper discusses theoretical findings and points to an extended version on arXiv, but does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	No	The paper describes a theoretical model involving 'm independent binary events Y1, . . . , Ym {0, 1}' and 'p-biased coins'. It also refers to general examples of forecasting tournaments like 'Good Judgment Project' and 'Kaggle' for context, but it does not use or provide access to any specific dataset for empirical evaluation.
Dataset Splits	No	As this paper presents theoretical analysis, it does not involve empirical experiments requiring dataset splits.
Hardware Specification	No	This paper provides a theoretical analysis of forecasting competitions and does not describe any experiments that would require specific hardware for execution.
Software Dependencies	No	This paper focuses on theoretical analysis and does not mention any specific software dependencies with version numbers that would be required to reproduce experimental results.
Experiment Setup	No	This paper presents theoretical results and mathematical models, not empirical experiments, and therefore does not include details on experimental setup, hyperparameters, or training configurations.