reproducibilityindex.ai

Policy Poisoning in Batch Reinforcement Learning and Control

Authors: Yuzhe Ma, Xuezhou Zhang, Wen Sun, Jerry Zhu

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show the effectiveness of policy poisoning attacks.
Researcher Affiliation	Collaboration	Yuzhe Ma University of Wisconsin Madison, Xuezhou Zhang University of Wisconsin Madison, Wen Sun Microsoft Research New York, Xiaojin Zhu University of Wisconsin Madison
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code can be found in https://github.com/myzwisc/PPRL_NeurIPS19.
Open Datasets	No	The paper describes how the training data was generated for each experiment (e.g., 'consists of 4 tuples', 'single item for every state-action pair', 'simulate a total of 400 time steps') but does not provide concrete access information for a publicly available or open dataset.
Dataset Splits	No	The paper describes the generation of training data and then uses poisoned versions of it, but does not provide specific details on dataset split information (e.g., train/validation/test percentages or counts) or reference predefined splits for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using 'CVXPY [8] to implement the optimization' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup	Yes	Experiment 1. ... The discounting factor is γ = 0.9. ... The attacker sets " = 1 and uses = 2, i.e. kr r0k2 as the attack cost. ... Experiment 4. ... we let h = 0.1, m = 1, = 0.5, and wt N(0, σ2I) with σ = 0.01. ... we let γ = 0.9 for solving the optimal control policy in (21). ... We run our attack (27)-(33) with = 2 and " = 0.01 in (32).