Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reinforcement Learning under Model Mismatch
Authors: Aurko Roy, Huan Xu, Sebastian Pokutta
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate empirically the improvement in performance for the robust algorithms compared to their nominal counterparts. For this we used various Reinforcement Learning test environments from Open AI [9] as benchmark to assess the improvement in performance as well as to ensure reproducibility and consistency of our results. |
| Researcher Affiliation | Collaboration | Aurko Roy1, Huan Xu2, and Sebastian Pokutta2 1Google , Email: EMAIL 2ISy E, Georgia Institute of Technology, Atlanta, GA, USA. Email: EMAIL |
| Pseudocode | No | The paper describes algorithms using mathematical equations (e.g., Equation 12, 23, 24) but does not present them in a clearly labeled pseudocode or algorithm block format. |
| Open Source Code | No | No explicit statement about providing their own code, only that they used Open AI gym for benchmarking. |
| Open Datasets | Yes | For this we used various Reinforcement Learning test environments from Open AI [9] as benchmark to assess the improvement in performance as well as to ensure reproducibility and consistency of our results. |
| Dataset Splits | Yes | The size of the confidence region Ua i for the robust model is chosen by a 10-fold cross validation via line search. |
| Hardware Specification | No | No specific hardware details (GPU, CPU models, memory) are provided. |
| Software Dependencies | No | The paper mentions 'Open AI gym framework [9]' but does not provide version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | The size of the confidence region Ua i for the robust model is chosen by a 10-fold cross validation via line search. To test the performance of the robust algorithms, we perturb the models slightly by choosing with a small probability p a random state after every action. Frozen Lake-v0 with p = 0.01. |