Manipulating a Learning Defender and Ways to Counteract

Authors: Jiarui Gan, Qingyu Guo, Long Tran-Thanh, Bo An, Michael Wooldridge

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation shows that our approaches can improve the defender s utility significantly as compared to the situation when attacker manipulation is ignored.
Researcher Affiliation Academia Jiarui Gan University of Oxford Oxford, UK jiarui.gan@cs.ox.ac.uk Qingyu Guo Nanyang Technological University Singapore qguo005@e.ntu.edu.sg Long Tran-Thanh University of Southampton Southampton, UK l.tran-thanh@soton.ac.uk Bo An Nanyang Technological University Singapore boan@ntu.edu.sg Michael Wooldridge University of Oxford Oxford, UK mjw@cs.ox.ac.uk
Pseudocode Yes Algorithm 1: Decide if there exists a policy π such that Eo P(π) ξ.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No In our evaluations, attacker types are randomly generated using the covariance model [15], with a parameter ρ [0, 1] to control the closeness of the generated game to a zero-sum game.
Dataset Splits No The paper describes how attacker types are generated but does not specify dataset split information such as percentages or counts for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes In our evaluations, attacker types are randomly generated using the covariance model [15], with a parameter ρ [0, 1] to control the closeness of the generated game to a zero-sum game. That is, we shift each payoff parameter x towards the corresponding one y of a zero-sum attacker type, letting x (1 ρ) x + ρ y. All results shown are the average of at least 50 runs. Figure 1 (a) and (b), shows the variance of the Eo P with respect to ρ and the size of the game. Except for the QR policy with ϕ = 10, performance of all other policies is very close to each other, though there is a discernable gap between the optimal policy and the SSE policy. In (a), results are obtained with other parameters set to λ = 100, m = 10, and n = 50; and in (b) with m = n/5, ρ = 0.5, and λ = 100.