Mixing-Time Regularized Policy Gradient
Authors: Tetsuro Morimura, Takayuki Osogami, Tomoyuki Shirai
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments in Section 6 show that the proposed method outperforms conventional PGRL methods. |
| Researcher Affiliation | Collaboration | Tetsuro Morimura IBM Research Tokyo 5-6-52 Toyosu, Koto-ku Tokyo, Japan tetsuro@jp.ibm.com Takayuki Osogami IBM Research Tokyo 5-6-52 Toyosu, Koto-ku Tokyo, Japan osogami@jp.ibm.com Tomoyuki Shirai Kyushu University 744 Motooka, Nishi-ku Fukuoka, Japan shirai@imi.kyushu-u.ac.jp |
| Pseudocode | Yes | Algorithm 1: An implementation of the mixing-time regularized policy gradient reinforcement learning |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | Yes | The task is a simple two-state MDP in (Kakade 2002) |
| Dataset Splits | No | The paper does not provide specific details on dataset splits (e.g., percentages or sample counts for training, validation, or testing). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The hyper-parameters of those methods were tuned. The targeted average reward η , which is a hyper-parameter in the proposed method of Option 2, was set as η := 0.75 maxθ R2 η(θ) = 1.5. |