reproducibilityindex.ai

Optimal Attack and Defense for Reinforcement Learning

Authors: Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments We illustrate our frameworks with a classical grid-world shortest path problem with obstacles. Here, each state is a cell in a n n grid... We test our methods on a 10 10 grid world... The agent receives 100, 0, and 160 value from each attack respectively.
Researcher Affiliation	Academia	University of Wisconsin-Madison jmcmahan@wisc.edu, yw@cs.wisc.edu, jerryzhu@cs.wisc.edu, qiaomin.xie@wisc.edu
Pseudocode	Yes	Algorithm 1: Attacker Interaction Protocol
Open Source Code	No	The paper does not include any explicit statement about making the source code available, nor does it provide a link to a code repository for its methodology.
Open Datasets	No	Here, each state is a cell in a n n grid. Some grid cells are filled with lava and so dangerous to the victim... Here, we test our methods on a 10 10 grid world with H = 20 so that the victim has enough time to reach the goal and stay there. The paper describes a custom grid-world environment for its experiments but does not provide access information (link, DOI, citation) to a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific details regarding training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for data partitioning.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU models, or memory) used for conducting its experiments.
Software Dependencies	No	The paper mentions using 'standard RL techniques' and 'standard zero-sum POTBSG planning or distributed learning algorithms' but does not provide specific software names with version numbers or reproducible dependency information.
Experiment Setup	No	The paper describes the grid world setup (e.g., 10x10 grid, H=20 for horizon) but does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations for reproducing the experiment setup.