Improving Reinforcement Learning with Confidence-Based Demonstrations
Authors: Zhaodong Wang, Matthew E. Taylor
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CHAT using the domains of simulated robot soccer and Mario, empirically showing it outperforms both HAT and learning without transfer. |
| Researcher Affiliation | Academia | Zhaodong Wang School of EECS Washington State University zhaodong.wang@wsu.edu Matthew E. Taylor School of EECS Washington State University taylorm@eecs.wsu.edu |
| Pseudocode | Yes | Algorithm 1: GPHAT: Bootstrap target learning |
| Open Source Code | Yes | Our code and demonstration datasets are available at the first author s website: irll.eecs.wsu.edu/lab-members/zhaodong-wang/. |
| Open Datasets | Yes | Our code and demonstration datasets are available at the first author s website: irll.eecs.wsu.edu/lab-members/zhaodong-wang/. |
| Dataset Splits | No | Among {0.9, 0.99, 0.999, 0.9999}, preliminary experiments found ΦD = 0.999 to be the best for Keepaway and ΦD = 0.9999 to be the best for Mario (explored further in Section 5.2). |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for experiments. |
| Software Dependencies | Yes | We use version 9.4.5 of the Robocup Soccer Server [Noda et al., 1998], and version 0.9 of the Keepaway player framework [Stone et al., 2006]. For DTHAT, we train J48 tree with the default parameters of Weka 3.6. Expectation-Maximization (EM) algorithm [Celeux and Govaert, 1992], with default parameter settings in Weka 3.6 [Witten and Frank, 2005] |
| Experiment Setup | Yes | SARSA uses: α = 0.05, ϵ = 0.1, and γ = 1. Q-learning uses: α = 1 10 32, ϵ = 0.1, and γ = 0.9. The parameter Φ determines when the agent listens to prior knowledge. Φ is multiplied by a decay factor, ΦD, on every time step. Among {0.9, 0.99, 0.999, 0.9999}, preliminary experiments found ΦD = 0.999 to be the best for Keepaway and ΦD = 0.9999 to be the best for Mario (explored further in Section 5.2). The confidence threshold of neural network is 0.6 while that of the decision tree is 0.85. |