Non-Cooperative Inverse Reinforcement Learning

Authors: Xiangyuan Zhang, Kaiqing Zhang, Erik Miehling, Tamer Basar

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the benefits of our N-CIRL formalism over the existing multi-agent IRL formalism via extensive numerical simulation in a novel cyber security setting. and 3) We demonstrate in a novel cyber security model that the adaptive strategies obtained from N-CIRL outperform strategies obtained from existing multi-agent IRL techniques.
Researcher Affiliation Academia Coordinated Science Laboratory University of Illinois at Urbana-Champaign {xz7,kzhang66,miehling,basar1}@illinois.edu
Pseudocode Yes Details of both Υv and Υw using sawtooth functions can be found in the pseudocode in Sec. C in the Appendix.
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository.
Open Datasets No The paper states 'The experiments are run on random instances of attack graphs' but does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes The threat model in our experiment is based on an attack graph... Each exploit eij is associated with a probability of success, βij, describing the likelihood of the exploit succeeding (if attempted). ...The attacker s reward is R(s, a, d, s ; θ) = re(s, s ; θ) c A(a) + c D(d), where s is the updated state, re(s, s ; θ) is the attacker s reward for any newly enabled conditions, and c A(a) and c D(d) are costs for attack and defense actions, respectively. The experiments are run on random instances of attack graphs; see some instances in Figure 1. See Sec. C for more details of the experimental setup.