Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal Attack and Defense for Reinforcement Learning
Authors: Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments We illustrate our frameworks with a classical grid-world shortest path problem with obstacles. Here, each state is a cell in a n n grid... We test our methods on a 10 10 grid world... The agent receives 100, 0, and 160 value from each attack respectively. |
| Researcher Affiliation | Academia | University of Wisconsin-Madison EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Attacker Interaction Protocol |
| Open Source Code | No | The paper does not include any explicit statement about making the source code available, nor does it provide a link to a code repository for its methodology. |
| Open Datasets | No | Here, each state is a cell in a n n grid. Some grid cells are filled with lava and so dangerous to the victim... Here, we test our methods on a 10 10 grid world with H = 20 so that the victim has enough time to reach the goal and stay there. The paper describes a custom grid-world environment for its experiments but does not provide access information (link, DOI, citation) to a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific details regarding training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) needed for data partitioning. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU models, CPU models, or memory) used for conducting its experiments. |
| Software Dependencies | No | The paper mentions using 'standard RL techniques' and 'standard zero-sum POTBSG planning or distributed learning algorithms' but does not provide specific software names with version numbers or reproducible dependency information. |
| Experiment Setup | No | The paper describes the grid world setup (e.g., 10x10 grid, H=20 for horizon) but does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations for reproducing the experiment setup. |