Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning
Authors: Guanlin Liu, Lifeng LAI
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Numerical Results In this section, we empirically compare the performance of the action poisoning only attack strategy (d-portion attack), the reward poisoning only attack strategy (η-gap attack) and the mixed attack strategy. We consider a simple case of Markov game where m = 2, H = 2 and |S| = 3. This Markov game is the example in Appendix F.2. |
| Researcher Affiliation | Academia | Guanlin Liu Lifeng Lai Department of Electrical and Computer Engineering University of California, Davis One Shields Avenue, Davis, CA 95616 EMAIL |
| Pseudocode | Yes | Algorithm 1: Exploration phase for Markov games |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is provided or publicly available. |
| Open Datasets | No | We consider a simple case of Markov game where m = 2, H = 2 and |S| = 3. This Markov game is the example in Appendix F.2. |
| Dataset Splits | No | The paper does not provide specific dataset split information (percentages, sample counts, citations, or detailed methodology) for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions V-learning as an algorithm used but does not provide specific version numbers for any ancillary software components like programming languages, libraries, or solvers. |
| Experiment Setup | Yes | We set the total number of episodes K = 107. We set the total number of steps H = 6. [...] Suppose ADV_BANDIT_UPDATE of V-learning follows Algorithm 3 in Appendix J.2 and it chooses hyper-parameter wt = αt Qt i=2(1 αi) 1 , γt = q Bt and αt = H+1 H+t . |