Adaptive Reward-Poisoning Attacks against Reinforcement Learning

Authors: Xuezhou Zhang, Yuzhe Ma, Adish Singla, Xiaojin Zhu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we formulate that the reward poisoning problem as an optimal control problem on a higher-level attack MDP, and developed computational tools based on DRL that is able to find efficient attack policies across a variety of environments. ... In this section, We make empirical comparisons between a number of attack policies φ: ... In all of our experiments, we assume a standard Q-learning RL agent with parameters: Q0 = 0S A, ε = 0.1, γ = 0.9, αt = 0.9, t. The plots show 1 standard error around each curve (some are difficult to see).
Researcher Affiliation Academia 1University of Wisconsin-Madison 2Max Planck Institute for Software Systems (MPI-SWS).
Pseudocode Yes Algorithm 1 Reward Poisoning against Q-learning; Algorithm 2 The Non-Adaptive Attack φsas 3; Algorithm 3 The Fast Adaptive Attack (FAA).
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper uses standard environments like a "chain MDP" and a "Grid World". While these environments are well-defined, the paper does not refer to them as publicly available datasets with specific access information (link, DOI, or formal citation to a dataset repository).
Dataset Splits No The paper describes experiments within simulated environments (MDPs) for reinforcement learning. It does not mention explicit training/validation/test dataset splits as would typically apply to static datasets in supervised learning.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU/GPU models, memory, or cloud instance types).
Software Dependencies No The paper mentions "Twin Delayed DDPG (TD3)" but does not specify its version number or any other software dependencies with version information.
Experiment Setup Yes In all of our experiments, we assume a standard Q-learning RL agent with parameters: Q0 = 0S A, ε = 0.1, γ = 0.9, αt = 0.9, t.