Non-Cooperative Inverse Reinforcement Learning
Authors: Xiangyuan Zhang, Kaiqing Zhang, Erik Miehling, Tamer Basar
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate the benefits of our N-CIRL formalism over the existing multi-agent IRL formalism via extensive numerical simulation in a novel cyber security setting. and 3) We demonstrate in a novel cyber security model that the adaptive strategies obtained from N-CIRL outperform strategies obtained from existing multi-agent IRL techniques. |
| Researcher Affiliation | Academia | Coordinated Science Laboratory University of Illinois at Urbana-Champaign {xz7,kzhang66,miehling,basar1}@illinois.edu |
| Pseudocode | Yes | Details of both Υv and Υw using sawtooth functions can be found in the pseudocode in Sec. C in the Appendix. |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | No | The paper states 'The experiments are run on random instances of attack graphs' but does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | The threat model in our experiment is based on an attack graph... Each exploit eij is associated with a probability of success, βij, describing the likelihood of the exploit succeeding (if attempted). ...The attacker s reward is R(s, a, d, s ; θ) = re(s, s ; θ) c A(a) + c D(d), where s is the updated state, re(s, s ; θ) is the attacker s reward for any newly enabled conditions, and c A(a) and c D(d) are costs for attack and defense actions, respectively. The experiments are run on random instances of attack graphs; see some instances in Figure 1. See Sec. C for more details of the experimental setup. |