Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Game-Theoretic Question Selection for Tests
Authors: Yuqian Li, Vincent Conitzer
JAIR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are also provided for those proposed algorithms to show their scalability and the increase of the tester s utility relative to that of the uniform-at-random strategy. |
| Researcher Affiliation | Academia | Yuqian Li EMAIL Vincent Conitzer EMAIL Duke University Durham, NC 27707 USA |
| Pseudocode | Yes | Algorithm 1 Input: A binary test game with t = 1 and an optimal primal solution (U, (zθ,q)) to LP (4). 1: T {q | U0 q P θ:q Hθ p(θ)vθzθ,q = U} 2: S {θ : P q Hθ T zθ,q < mθ} 3: let all θ be unmarked 4: while S has an unmarked element do 5: θ an unmarked element from S 6: for all q Hθ T and zθ,q < 1 do 7: S S {θ Θ : q Hθ zθ ,q > 0} 8: T T \ {q} 11: end while 12: return the uniform distribution over T |
| Open Source Code | No | The paper does not provide any explicit statement about making the source code available, nor does it include a link to a code repository. |
| Open Datasets | No | For each experimental data point, we specify three parameters: the number of questions n, the number of types L (|Θ| = L), and the maximum memory size mmax. We always set bmax, the maximum size of any Hθ, to 2mmax. Given those parameters, a test game instance is randomly generated as follows: for each θ Θ, draw mθ uniformly from 1 to mmax; draw |Hθ| uniformly from mθ to bmax; generate Hθ by drawing |Hθ| elements from Q uniformly; and draw wθ = p(θ)vθ uniformly from [0, 1] (these two factors always appear together). |
| Dataset Splits | No | For each data point, we generate 5 test game instances and compute the average running time for each algorithm. We set a timeout of 5 seconds for each instance. Figures 2(a), 2(b), and 2(c) show how the algorithms scale in n, L, and mmax, respectively, holding the other parameters fixed. |
| Hardware Specification | Yes | Our machine has an Intel i7-2600 3.40GHz CPU and 8GB memory. |
| Software Dependencies | Yes | In particular, we use CPLEX 12.6.0.0 and the boost 1.46.1 C++ library for Edmonds Karp and Push-Relabel. |
| Experiment Setup | Yes | For each experimental data point, we specify three parameters: the number of questions n, the number of types L (|Θ| = L), and the maximum memory size mmax. We always set bmax, the maximum size of any Hθ, to 2mmax. Given those parameters, a test game instance is randomly generated as follows: for each θ Θ, draw mθ uniformly from 1 to mmax; draw |Hθ| uniformly from mθ to bmax; generate Hθ by drawing |Hθ| elements from Q uniformly; and draw wθ = p(θ)vθ uniformly from [0, 1] (these two factors always appear together). For each data point, we generate 5 test game instances and compute the average running time for each algorithm. We set a timeout of 5 seconds for each instance. We use the network-flow approach from Definition 4 with binary search on U to a precision of 10 8 |