Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Game-Theoretic Question Selection for Tests

Authors: Yuqian Li, Vincent Conitzer

JAIR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are also provided for those proposed algorithms to show their scalability and the increase of the tester s utility relative to that of the uniform-at-random strategy.
Researcher Affiliation	Academia	Yuqian Li EMAIL Vincent Conitzer EMAIL Duke University Durham, NC 27707 USA
Pseudocode	Yes	Algorithm 1 Input: A binary test game with t = 1 and an optimal primal solution (U, (zθ,q)) to LP (4). 1: T {q \| U0 q P θ:q Hθ p(θ)vθzθ,q = U} 2: S {θ : P q Hθ T zθ,q < mθ} 3: let all θ be unmarked 4: while S has an unmarked element do 5: θ an unmarked element from S 6: for all q Hθ T and zθ,q < 1 do 7: S S {θ Θ : q Hθ zθ ,q > 0} 8: T T \ {q} 11: end while 12: return the uniform distribution over T
Open Source Code	No	The paper does not provide any explicit statement about making the source code available, nor does it include a link to a code repository.
Open Datasets	No	For each experimental data point, we specify three parameters: the number of questions n, the number of types L (\|Θ\| = L), and the maximum memory size mmax. We always set bmax, the maximum size of any Hθ, to 2mmax. Given those parameters, a test game instance is randomly generated as follows: for each θ Θ, draw mθ uniformly from 1 to mmax; draw \|Hθ\| uniformly from mθ to bmax; generate Hθ by drawing \|Hθ\| elements from Q uniformly; and draw wθ = p(θ)vθ uniformly from [0, 1] (these two factors always appear together).
Dataset Splits	No	For each data point, we generate 5 test game instances and compute the average running time for each algorithm. We set a timeout of 5 seconds for each instance. Figures 2(a), 2(b), and 2(c) show how the algorithms scale in n, L, and mmax, respectively, holding the other parameters ﬁxed.
Hardware Specification	Yes	Our machine has an Intel i7-2600 3.40GHz CPU and 8GB memory.
Software Dependencies	Yes	In particular, we use CPLEX 12.6.0.0 and the boost 1.46.1 C++ library for Edmonds Karp and Push-Relabel.
Experiment Setup	Yes	For each experimental data point, we specify three parameters: the number of questions n, the number of types L (\|Θ\| = L), and the maximum memory size mmax. We always set bmax, the maximum size of any Hθ, to 2mmax. Given those parameters, a test game instance is randomly generated as follows: for each θ Θ, draw mθ uniformly from 1 to mmax; draw \|Hθ\| uniformly from mθ to bmax; generate Hθ by drawing \|Hθ\| elements from Q uniformly; and draw wθ = p(θ)vθ uniformly from [0, 1] (these two factors always appear together). For each data point, we generate 5 test game instances and compute the average running time for each algorithm. We set a timeout of 5 seconds for each instance. We use the network-ﬂow approach from Deﬁnition 4 with binary search on U to a precision of 10 8