Reinforcement Learning under Model Mismatch
Authors: Aurko Roy, Huan Xu, Sebastian Pokutta
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate empirically the improvement in performance for the robust algorithms compared to their nominal counterparts. For this we used various Reinforcement Learning test environments from Open AI [9] as benchmark to assess the improvement in performance as well as to ensure reproducibility and consistency of our results. |
| Researcher Affiliation | Collaboration | Aurko Roy1, Huan Xu2, and Sebastian Pokutta2 1Google , Email: aurkor@google.com 2ISy E, Georgia Institute of Technology, Atlanta, GA, USA. Email: huan.xu@isye.gatech.edu |
| Pseudocode | No | The paper describes algorithms using mathematical equations (e.g., Equation 12, 23, 24) but does not present them in a clearly labeled pseudocode or algorithm block format. |
| Open Source Code | No | No explicit statement about providing their own code, only that they used Open AI gym for benchmarking. |
| Open Datasets | Yes | For this we used various Reinforcement Learning test environments from Open AI [9] as benchmark to assess the improvement in performance as well as to ensure reproducibility and consistency of our results. |
| Dataset Splits | Yes | The size of the confidence region Ua i for the robust model is chosen by a 10-fold cross validation via line search. |
| Hardware Specification | No | No specific hardware details (GPU, CPU models, memory) are provided. |
| Software Dependencies | No | The paper mentions 'Open AI gym framework [9]' but does not provide version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | The size of the confidence region Ua i for the robust model is chosen by a 10-fold cross validation via line search. To test the performance of the robust algorithms, we perturb the models slightly by choosing with a small probability p a random state after every action. Frozen Lake-v0 with p = 0.01. |