Model-Free Robust Average-Reward Reinforcement Learning
Authors: Yue Wang, Alvaro Velasquez, George K. Atia, Ashley Prater-Bennette, Shaofeng Zou
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments We numerically verify our previous convergence results and demonstrate the robustness of our algorithms. Additional experiments can be found in Appendix G. |
| Researcher Affiliation | Collaboration | 1University at Buffalo 2University of Colorado Boulder 3University of Central Florida 4 Air Force Research Laboratory. |
| Pseudocode | Yes | Algorithm 1 Robust RVI TD |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper uses descriptions of problem environments (Garnet problem, Frozen-Lake environment, Recycling Robot, Inventory Control Problem) and describes how data (transition kernels) are generated or environment parameters are set, but does not provide concrete access information (links, DOIs, specific citations to public datasets with authors/year) for a pre-existing, publicly available dataset used for training. |
| Dataset Splits | No | The paper describes experimental runs and repetitions (e.g., '30 times') but does not specify any training, validation, or test dataset splits (e.g., exact percentages or sample counts). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Gym' but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We set δ = 0.4, αn = 0.01, f(V ) = |S||A| . and We set δ = 0.4 and implement our algorithms and vanilla Q-learning under the nominal environment (α = β = 0.5) with stepsize 0.01. |