reproducibilityindex.ai

Mixing-Time Regularized Policy Gradient

Authors: Tetsuro Morimura, Takayuki Osogami, Tomoyuki Shirai

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments in Section 6 show that the proposed method outperforms conventional PGRL methods.
Researcher Affiliation	Collaboration	Tetsuro Morimura IBM Research Tokyo 5-6-52 Toyosu, Koto-ku Tokyo, Japan tetsuro@jp.ibm.com Takayuki Osogami IBM Research Tokyo 5-6-52 Toyosu, Koto-ku Tokyo, Japan osogami@jp.ibm.com Tomoyuki Shirai Kyushu University 744 Motooka, Nishi-ku Fukuoka, Japan shirai@imi.kyushu-u.ac.jp
Pseudocode	Yes	Algorithm 1: An implementation of the mixing-time regularized policy gradient reinforcement learning
Open Source Code	No	The paper does not mention providing open-source code for the described methodology.
Open Datasets	Yes	The task is a simple two-state MDP in (Kakade 2002)
Dataset Splits	No	The paper does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing).
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The hyper-parameters of those methods were tuned. The targeted average reward η , which is a hyper-parameter in the proposed method of Option 2, was set as η := 0.75 maxθ R2 η(θ) = 1.5.