Mixing-Time Regularized Policy Gradient

Authors: Tetsuro Morimura, Takayuki Osogami, Tomoyuki Shirai

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments in Section 6 show that the proposed method outperforms conventional PGRL methods.
Researcher Affiliation Collaboration Tetsuro Morimura IBM Research Tokyo 5-6-52 Toyosu, Koto-ku Tokyo, Japan tetsuro@jp.ibm.com Takayuki Osogami IBM Research Tokyo 5-6-52 Toyosu, Koto-ku Tokyo, Japan osogami@jp.ibm.com Tomoyuki Shirai Kyushu University 744 Motooka, Nishi-ku Fukuoka, Japan shirai@imi.kyushu-u.ac.jp
Pseudocode Yes Algorithm 1: An implementation of the mixing-time regularized policy gradient reinforcement learning
Open Source Code No The paper does not mention providing open-source code for the described methodology.
Open Datasets Yes The task is a simple two-state MDP in (Kakade 2002)
Dataset Splits No The paper does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing).
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The hyper-parameters of those methods were tuned. The targeted average reward η , which is a hyper-parameter in the proposed method of Option 2, was set as η := 0.75 maxθ R2 η(θ) = 1.5.