Reinforcement Learning under Model Mismatch

Authors: Aurko Roy, Huan Xu, Sebastian Pokutta

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate empirically the improvement in performance for the robust algorithms compared to their nominal counterparts. For this we used various Reinforcement Learning test environments from Open AI [9] as benchmark to assess the improvement in performance as well as to ensure reproducibility and consistency of our results.
Researcher Affiliation Collaboration Aurko Roy1, Huan Xu2, and Sebastian Pokutta2 1Google , Email: aurkor@google.com 2ISy E, Georgia Institute of Technology, Atlanta, GA, USA. Email: huan.xu@isye.gatech.edu
Pseudocode No The paper describes algorithms using mathematical equations (e.g., Equation 12, 23, 24) but does not present them in a clearly labeled pseudocode or algorithm block format.
Open Source Code No No explicit statement about providing their own code, only that they used Open AI gym for benchmarking.
Open Datasets Yes For this we used various Reinforcement Learning test environments from Open AI [9] as benchmark to assess the improvement in performance as well as to ensure reproducibility and consistency of our results.
Dataset Splits Yes The size of the confidence region Ua i for the robust model is chosen by a 10-fold cross validation via line search.
Hardware Specification No No specific hardware details (GPU, CPU models, memory) are provided.
Software Dependencies No The paper mentions 'Open AI gym framework [9]' but does not provide version numbers for it or any other software dependencies.
Experiment Setup Yes The size of the confidence region Ua i for the robust model is chosen by a 10-fold cross validation via line search. To test the performance of the robust algorithms, we perturb the models slightly by choosing with a small probability p a random state after every action. Frozen Lake-v0 with p = 0.01.