Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
Authors: Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, Adish Singla
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform numerical simulations on an environment represented as an MDP with four states and two actions, see Figure 2 for details. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Software Systems (MPI-SWS). 2University of Wisconsin-Madison. |
| Pseudocode | No | The paper describes its methods through mathematical formulations and prose, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementation details and code are provided in supplementary materials. |
| Open Datasets | No | The environment is custom-defined and described in Figure 2 and Section 6, but no public access information (link, citation) is provided for it as a dataset. |
| Dataset Splits | No | The paper conducts simulations within a defined MDP environment and does not describe dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the UCRL learning algorithm but does not specify any software libraries or packages with version numbers used for implementation. |
| Experiment Setup | Yes | The regularity parameter δ in the problems for solving dynamic poisoning attacks is set to 0.0001. In the experiments, we fix R(s0, .) = 2.5 and ϵ = 0.1 |