Manipulating a Learning Defender and Ways to Counteract
Authors: Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, Michael Wooldridge
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation shows that our approaches can improve the defender s utility significantly as compared to the situation when attacker manipulation is ignored. |
| Researcher Affiliation | Academia | Jiarui Gan University of Oxford Oxford, UK jiarui.gan@cs.ox.ac.uk Qingyu Guo Nanyang Technological University Singapore qguo005@e.ntu.edu.sg Long Tran-Thanh University of Southampton Southampton, UK l.tran-thanh@soton.ac.uk Bo An Nanyang Technological University Singapore boan@ntu.edu.sg Michael Wooldridge University of Oxford Oxford, UK mjw@cs.ox.ac.uk |
| Pseudocode | Yes | Algorithm 1: Decide if there exists a policy π such that Eo P(π) ξ. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | In our evaluations, attacker types are randomly generated using the covariance model [15], with a parameter ρ [0, 1] to control the closeness of the generated game to a zero-sum game. |
| Dataset Splits | No | The paper describes how attacker types are generated but does not specify dataset split information such as percentages or counts for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | In our evaluations, attacker types are randomly generated using the covariance model [15], with a parameter ρ [0, 1] to control the closeness of the generated game to a zero-sum game. That is, we shift each payoff parameter x towards the corresponding one y of a zero-sum attacker type, letting x (1 ρ) x + ρ y. All results shown are the average of at least 50 runs. Figure 1 (a) and (b), shows the variance of the Eo P with respect to ρ and the size of the game. Except for the QR policy with ϕ = 10, performance of all other policies is very close to each other, though there is a discernable gap between the optimal policy and the SSE policy. In (a), results are obtained with other parameters set to λ = 100, m = 10, and n = 50; and in (b) with m = n/5, ρ = 0.5, and λ = 100. |