Temporal Regularization for Markov Decision Process
Authors: Pierre Thodoroff, Audrey Durand, Joelle Pineau, Doina Precup
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games. |
| Researcher Affiliation | Collaboration | Pierre Thodoroff Mc Gill University pierre.thodoroff@mail.mcgill.ca Audrey Durand Mc Gill University audrey.durand@mcgill.ca Joelle Pineau Mc Gill University & Facebook AI Research jpineau@cs.mcgill.ca Doina Precup Mc Gill University dprecup@cs.mcgill.ca |
| Pseudocode | Yes | Algorithm 1 Policy evaluation with exponential smoothing |
| Open Source Code | Yes | Our implementation2 is based on the publicly available Open AI codebase [8]. The code can be found https://github.com/pierthodo/temporal_regularization. |
| Open Datasets | Yes | We evaluate the performance in the Arcade Learning Environment [4] |
| Dataset Splits | No | The paper mentions 10 independent runs with random seeds but does not explicitly describe train/validation/test splits or a cross-validation setup for the datasets used in experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "Open AI codebase" but does not specify version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | The hyper-parameters for the temporal regularization are β = λ = 0.2 and a decay of 1e -5. Those are selected on 7 games and 3 training seeds. All other hyper-parameters correspond to the one used in the PPO paper. |