Model-Free Robust Average-Reward Reinforcement Learning

Authors: Yue Wang, Alvaro Velasquez, George K. Atia, Ashley Prater-Bennette, Shaofeng Zou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments We numerically verify our previous convergence results and demonstrate the robustness of our algorithms. Additional experiments can be found in Appendix G.
Researcher Affiliation Collaboration 1University at Buffalo 2University of Colorado Boulder 3University of Central Florida 4 Air Force Research Laboratory.
Pseudocode Yes Algorithm 1 Robust RVI TD
Open Source Code No The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The paper uses descriptions of problem environments (Garnet problem, Frozen-Lake environment, Recycling Robot, Inventory Control Problem) and describes how data (transition kernels) are generated or environment parameters are set, but does not provide concrete access information (links, DOIs, specific citations to public datasets with authors/year) for a pre-existing, publicly available dataset used for training.
Dataset Splits No The paper describes experimental runs and repetitions (e.g., '30 times') but does not specify any training, validation, or test dataset splits (e.g., exact percentages or sample counts).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions 'Open AI Gym' but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We set δ = 0.4, αn = 0.01, f(V ) = |S||A| . and We set δ = 0.4 and implement our algorithms and vanilla Q-learning under the nominal environment (α = β = 0.5) with stepsize 0.01.