Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Commitment to Sparse Strategies in Two-Player Games

Authors: Salam Afiouni, Jakub Černý, Chun Kai Ling, Christian Kroer

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our methods on synthetic and real-world scenarios based on security applications. In both settings, we observe that even for small support sizes, we can obtain more than 90% of the true Nash value while maintaining a reasonable runtime, demonstrating the significance of our formulation and algorithms. 5 Empirical Evaluation We empirically analyze the performance and scalability of our proposed methods, focusing on (i) wall-clock computational time, (ii) the utility that Player 1 achieves for different support sizes k, and (iii) comparisons to k-uniform strategies.
Researcher Affiliation	Academia	Salam Afiouni1, Jakub ˇCern y1, Chun Kai Ling2, Christian Kroer1 1Columbia University, USA 2National University of Singapore, Singapore EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided in the main text. The methods are described using mathematical formulations and textual explanations.
Open Source Code	Yes	Code github.com/Coffee And Convexity/Sparse Equilibria
Open Datasets	No	The paper describes generating synthetic datasets and using real-world scenarios ("university campus path network", "maps from the real world") but does not provide specific access information (link, DOI, formal citation) for any publicly available or open datasets used in the experiments.
Dataset Splits	No	The paper describes generating various instances for evaluation (e.g., "30 instances were generated and solved" for random games, "30 instances" for patrolling games) rather than splitting a pre-existing dataset into training, validation, or test sets. Therefore, no dataset split information is provided.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (CPU, GPU models, memory, etc.) used for running the experiments. It only mentions the use of Gurobi for MILP solvers.
Software Dependencies	Yes	We use Gurobi (Gurobi Optimization, LLC 2023) for all MILP solvers.
Experiment Setup	Yes	For zero-sum games, each Aij is uniformly chosen in [10, 100]. In general-sum games, Aij and Bij are randomly chosen in [ 50, 50], under the condition that Aij and Bij have opposite signs. For purposes of comparison, we normalize running times using tnormalized = t tmin tmax tmin , where tmin and tmax are the minimum and maximum runtimes respectively. We normalize the utility value by the Nash value for zero-sum games and the Stackelberg equilibrium (SE) for general-sum games. Specifically, unormalized = u umin uequi umin where umin is the utility at k = 1 and uequi is the true value, i.e. k = . Unless otherwise stated, for experiments with randomness, 30 instances were generated and solved. We report the standard errors in plots (usually negligible). Other experimental details are deferred to the Appendix.