Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Game-Theoretic Question Selection for Tests

Authors: Yuqian Li, Vincent Conitzer

JAIR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are also provided for those proposed algorithms to show their scalability and the increase of the tester s utility relative to that of the uniform-at-random strategy.
Researcher Affiliation Academia Yuqian Li EMAIL Vincent Conitzer EMAIL Duke University Durham, NC 27707 USA
Pseudocode Yes Algorithm 1 Input: A binary test game with t = 1 and an optimal primal solution (U, (zθ,q)) to LP (4). 1: T {q | U0 q P θ:q Hθ p(θ)vθzθ,q = U} 2: S {θ : P q Hθ T zθ,q < mθ} 3: let all θ be unmarked 4: while S has an unmarked element do 5: θ an unmarked element from S 6: for all q Hθ T and zθ,q < 1 do 7: S S {θ Θ : q Hθ zθ ,q > 0} 8: T T \ {q} 11: end while 12: return the uniform distribution over T
Open Source Code No The paper does not provide any explicit statement about making the source code available, nor does it include a link to a code repository.
Open Datasets No For each experimental data point, we specify three parameters: the number of questions n, the number of types L (|Θ| = L), and the maximum memory size mmax. We always set bmax, the maximum size of any Hθ, to 2mmax. Given those parameters, a test game instance is randomly generated as follows: for each θ Θ, draw mθ uniformly from 1 to mmax; draw |Hθ| uniformly from mθ to bmax; generate Hθ by drawing |Hθ| elements from Q uniformly; and draw wθ = p(θ)vθ uniformly from [0, 1] (these two factors always appear together).
Dataset Splits No For each data point, we generate 5 test game instances and compute the average running time for each algorithm. We set a timeout of 5 seconds for each instance. Figures 2(a), 2(b), and 2(c) show how the algorithms scale in n, L, and mmax, respectively, holding the other parameters fixed.
Hardware Specification Yes Our machine has an Intel i7-2600 3.40GHz CPU and 8GB memory.
Software Dependencies Yes In particular, we use CPLEX 12.6.0.0 and the boost 1.46.1 C++ library for Edmonds Karp and Push-Relabel.
Experiment Setup Yes For each experimental data point, we specify three parameters: the number of questions n, the number of types L (|Θ| = L), and the maximum memory size mmax. We always set bmax, the maximum size of any Hθ, to 2mmax. Given those parameters, a test game instance is randomly generated as follows: for each θ Θ, draw mθ uniformly from 1 to mmax; draw |Hθ| uniformly from mθ to bmax; generate Hθ by drawing |Hθ| elements from Q uniformly; and draw wθ = p(θ)vθ uniformly from [0, 1] (these two factors always appear together). For each data point, we generate 5 test game instances and compute the average running time for each algorithm. We set a timeout of 5 seconds for each instance. We use the network-flow approach from Definition 4 with binary search on U to a precision of 10 8