Optimal Attack and Defense for Reinforcement Learning

Authors: Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments We illustrate our frameworks with a classical grid-world shortest path problem with obstacles. Here, each state is a cell in a n n grid... We test our methods on a 10 10 grid world... The agent receives 100, 0, and 160 value from each attack respectively.
Researcher Affiliation Academia University of Wisconsin-Madison jmcmahan@wisc.edu, yw@cs.wisc.edu, jerryzhu@cs.wisc.edu, qiaomin.xie@wisc.edu
Pseudocode Yes Algorithm 1: Attacker Interaction Protocol
Open Source Code No The paper does not include any explicit statement about making the source code available, nor does it provide a link to a code repository for its methodology.
Open Datasets No Here, each state is a cell in a n n grid. Some grid cells are filled with lava and so dangerous to the victim... Here, we test our methods on a 10 10 grid world with H = 20 so that the victim has enough time to reach the goal and stay there. The paper describes a custom grid-world environment for its experiments but does not provide access information (link, DOI, citation) to a publicly available or open dataset.
Dataset Splits No The paper does not provide specific details regarding training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for data partitioning.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU models, or memory) used for conducting its experiments.
Software Dependencies No The paper mentions using 'standard RL techniques' and 'standard zero-sum POTBSG planning or distributed learning algorithms' but does not provide specific software names with version numbers or reproducible dependency information.
Experiment Setup No The paper describes the grid world setup (e.g., 10x10 grid, H=20 for horizon) but does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations for reproducing the experiment setup.