Improving Reinforcement Learning with Confidence-Based Demonstrations

Authors: Zhaodong Wang, Matthew E. Taylor

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate CHAT using the domains of simulated robot soccer and Mario, empirically showing it outperforms both HAT and learning without transfer.
Researcher Affiliation Academia Zhaodong Wang School of EECS Washington State University zhaodong.wang@wsu.edu Matthew E. Taylor School of EECS Washington State University taylorm@eecs.wsu.edu
Pseudocode Yes Algorithm 1: GPHAT: Bootstrap target learning
Open Source Code Yes Our code and demonstration datasets are available at the first author s website: irll.eecs.wsu.edu/lab-members/zhaodong-wang/.
Open Datasets Yes Our code and demonstration datasets are available at the first author s website: irll.eecs.wsu.edu/lab-members/zhaodong-wang/.
Dataset Splits No Among {0.9, 0.99, 0.999, 0.9999}, preliminary experiments found ΦD = 0.999 to be the best for Keepaway and ΦD = 0.9999 to be the best for Mario (explored further in Section 5.2).
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for experiments.
Software Dependencies Yes We use version 9.4.5 of the Robocup Soccer Server [Noda et al., 1998], and version 0.9 of the Keepaway player framework [Stone et al., 2006]. For DTHAT, we train J48 tree with the default parameters of Weka 3.6. Expectation-Maximization (EM) algorithm [Celeux and Govaert, 1992], with default parameter settings in Weka 3.6 [Witten and Frank, 2005]
Experiment Setup Yes SARSA uses: α = 0.05, ϵ = 0.1, and γ = 1. Q-learning uses: α = 1 10 32, ϵ = 0.1, and γ = 0.9. The parameter Φ determines when the agent listens to prior knowledge. Φ is multiplied by a decay factor, ΦD, on every time step. Among {0.9, 0.99, 0.999, 0.9999}, preliminary experiments found ΦD = 0.999 to be the best for Keepaway and ΦD = 0.9999 to be the best for Mario (explored further in Section 5.2). The confidence threshold of neural network is 0.6 while that of the decision tree is 0.85.