Optimal Attack and Defense for Reinforcement Learning
Authors: Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments We illustrate our frameworks with a classical grid-world shortest path problem with obstacles. Here, each state is a cell in a n n grid... We test our methods on a 10 10 grid world... The agent receives 100, 0, and 160 value from each attack respectively. |
| Researcher Affiliation | Academia | University of Wisconsin-Madison jmcmahan@wisc.edu, yw@cs.wisc.edu, jerryzhu@cs.wisc.edu, qiaomin.xie@wisc.edu |
| Pseudocode | Yes | Algorithm 1: Attacker Interaction Protocol |
| Open Source Code | No | The paper does not include any explicit statement about making the source code available, nor does it provide a link to a code repository for its methodology. |
| Open Datasets | No | Here, each state is a cell in a n n grid. Some grid cells are filled with lava and so dangerous to the victim... Here, we test our methods on a 10 10 grid world with H = 20 so that the victim has enough time to reach the goal and stay there. The paper describes a custom grid-world environment for its experiments but does not provide access information (link, DOI, citation) to a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific details regarding training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for data partitioning. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU models, CPU models, or memory) used for conducting its experiments. |
| Software Dependencies | No | The paper mentions using 'standard RL techniques' and 'standard zero-sum POTBSG planning or distributed learning algorithms' but does not provide specific software names with version numbers or reproducible dependency information. |
| Experiment Setup | No | The paper describes the grid world setup (e.g., 10x10 grid, H=20 for horizon) but does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations for reproducing the experiment setup. |