reproducibilityindex.ai

Temporal Regularization for Markov Decision Process

Authors: Pierre Thodoroff, Audrey Durand, Joelle Pineau, Doina Precup

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper explores a class of methods for temporal regularization. We formally characterize the bias induced by this technique using Markov chain concepts. We illustrate the various characteristics of temporal regularization via a sequence of simple discrete and continuous MDPs, and show that the technique provides improvement even in high-dimensional Atari games.
Researcher Affiliation	Collaboration	Pierre Thodoroff Mc Gill University pierre.thodoroff@mail.mcgill.ca Audrey Durand Mc Gill University audrey.durand@mcgill.ca Joelle Pineau Mc Gill University & Facebook AI Research jpineau@cs.mcgill.ca Doina Precup Mc Gill University dprecup@cs.mcgill.ca
Pseudocode	Yes	Algorithm 1 Policy evaluation with exponential smoothing
Open Source Code	Yes	Our implementation2 is based on the publicly available Open AI codebase [8]. The code can be found https://github.com/pierthodo/temporal_regularization.
Open Datasets	Yes	We evaluate the performance in the Arcade Learning Environment [4]
Dataset Splits	No	The paper mentions 10 independent runs with random seeds but does not explicitly describe train/validation/test splits or a cross-validation setup for the datasets used in experiments.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance specifications used for running the experiments.
Software Dependencies	No	The paper mentions using the "Open AI codebase" but does not specify version numbers for this or any other software dependencies.
Experiment Setup	Yes	The hyper-parameters for the temporal regularization are β = λ = 0.2 and a decay of 1e -5. Those are selected on 7 games and 3 training seeds. All other hyper-parameters correspond to the one used in the PPO paper.