reproducibilityindex.ai

Policy Optimization for Robust Average Reward MDPs

Authors: Zhongchang Sun, Sihong He, Fei Miao, Shaofeng Zou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide simulation results to demonstrate the performance of our algorithm.In this section, we provide some simulation results to demonstrate the performance of our algorithm.
Researcher Affiliation	Academia	Zhongchang Sun University at Buffalo zhongcha@buffalo.edu Sihong He University of Texas at Arlington sihong.he@uta.edu Fei Miao University of Connecticut fei.miao@uconn.edu Shaofeng Zou Arizona State University zou@asu.edu
Pseudocode	Yes	Algorithm 1 Robust Policy Mirror Descent
Open Source Code	No	The code will be released if the paper is accepted.
Open Datasets	Yes	We verify our method on one classical problem: the Garnet problem, and a robotic application problem: the recycling robot problem.More details can be found in [2].For more details, refer to [34].
Dataset Splits	No	The paper mentions training episodes and steps but does not specify explicit train/validation/test dataset splits.
Hardware Specification	Yes	The host machine used in our experiments is a server configured with AMD Ryzen Threadripper 2990WX 32-core processors and four Quadro RTX 6000 GPUs.
Software Dependencies	No	All experiments are performed on Python 3.8.
Experiment Setup	Yes	We consider the constant step size and set the step size η = 0.01, the pre-specified radius of the uncertainty set R = 0.1.Each training episode contains 2000 training steps. The length of training episodes is respectively 100 and 300 for Garnet and robot problems. We choose the uncertainty set to be the KL divergence uncertainty set.Both methods use a uniform random policy as the initialized policy.