Policy Poisoning in Batch Reinforcement Learning and Control
Authors: Yuzhe Ma, Xuezhou Zhang, Wen Sun, Jerry Zhu
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show the effectiveness of policy poisoning attacks. |
| Researcher Affiliation | Collaboration | Yuzhe Ma University of Wisconsin Madison, Xuezhou Zhang University of Wisconsin Madison, Wen Sun Microsoft Research New York, Xiaojin Zhu University of Wisconsin Madison |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code can be found in https://github.com/myzwisc/PPRL_NeurIPS19. |
| Open Datasets | No | The paper describes how the training data was generated for each experiment (e.g., 'consists of 4 tuples', 'single item for every state-action pair', 'simulate a total of 400 time steps') but does not provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes the generation of training data and then uses poisoned versions of it, but does not provide specific details on dataset split information (e.g., train/validation/test percentages or counts) or reference predefined splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'CVXPY [8] to implement the optimization' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | Experiment 1. ... The discounting factor is γ = 0.9. ... The attacker sets " = 1 and uses = 2, i.e. kr r0k2 as the attack cost. ... Experiment 4. ... we let h = 0.1, m = 1, = 0.5, and wt N(0, σ2I) with σ = 0.01. ... we let γ = 0.9 for solving the optimal control policy in (21). ... We run our attack (27)-(33) with = 2 and " = 0.01 in (32). |