reproducibilityindex.ai

Improving Reinforcement Learning with Confidence-Based Demonstrations

Authors: Zhaodong Wang, Matthew E. Taylor

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CHAT using the domains of simulated robot soccer and Mario, empirically showing it outperforms both HAT and learning without transfer.
Researcher Affiliation	Academia	Zhaodong Wang School of EECS Washington State University zhaodong.wang@wsu.edu Matthew E. Taylor School of EECS Washington State University taylorm@eecs.wsu.edu
Pseudocode	Yes	Algorithm 1: GPHAT: Bootstrap target learning
Open Source Code	Yes	Our code and demonstration datasets are available at the ﬁrst author s website: irll.eecs.wsu.edu/lab-members/zhaodong-wang/.
Open Datasets	Yes	Our code and demonstration datasets are available at the ﬁrst author s website: irll.eecs.wsu.edu/lab-members/zhaodong-wang/.
Dataset Splits	No	Among {0.9, 0.99, 0.999, 0.9999}, preliminary experiments found ΦD = 0.999 to be the best for Keepaway and ΦD = 0.9999 to be the best for Mario (explored further in Section 5.2).
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for experiments.
Software Dependencies	Yes	We use version 9.4.5 of the Robocup Soccer Server [Noda et al., 1998], and version 0.9 of the Keepaway player framework [Stone et al., 2006]. For DTHAT, we train J48 tree with the default parameters of Weka 3.6. Expectation-Maximization (EM) algorithm [Celeux and Govaert, 1992], with default parameter settings in Weka 3.6 [Witten and Frank, 2005]
Experiment Setup	Yes	SARSA uses: α = 0.05, ϵ = 0.1, and γ = 1. Q-learning uses: α = 1 10 32, ϵ = 0.1, and γ = 0.9. The parameter Φ determines when the agent listens to prior knowledge. Φ is multiplied by a decay factor, ΦD, on every time step. Among {0.9, 0.99, 0.999, 0.9999}, preliminary experiments found ΦD = 0.999 to be the best for Keepaway and ΦD = 0.9999 to be the best for Mario (explored further in Section 5.2). The conﬁdence threshold of neural network is 0.6 while that of the decision tree is 0.85.