Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Reinforcement Learning with Confidence-Based Demonstrations
Authors: Zhaodong Wang, Matthew E. Taylor
IJCAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CHAT using the domains of simulated robot soccer and Mario, empirically showing it outperforms both HAT and learning without transfer. |
| Researcher Affiliation | Academia | Zhaodong Wang School of EECS Washington State University EMAIL Matthew E. Taylor School of EECS Washington State University EMAIL |
| Pseudocode | Yes | Algorithm 1: GPHAT: Bootstrap target learning |
| Open Source Code | Yes | Our code and demonstration datasets are available at the first author s website: irll.eecs.wsu.edu/lab-members/zhaodong-wang/. |
| Open Datasets | Yes | Our code and demonstration datasets are available at the first author s website: irll.eecs.wsu.edu/lab-members/zhaodong-wang/. |
| Dataset Splits | No | Among {0.9, 0.99, 0.999, 0.9999}, preliminary experiments found ΦD = 0.999 to be the best for Keepaway and ΦD = 0.9999 to be the best for Mario (explored further in Section 5.2). |
| Hardware Specification | No | The paper does not provide specific hardware details such as CPU/GPU models or memory specifications used for experiments. |
| Software Dependencies | Yes | We use version 9.4.5 of the Robocup Soccer Server [Noda et al., 1998], and version 0.9 of the Keepaway player framework [Stone et al., 2006]. For DTHAT, we train J48 tree with the default parameters of Weka 3.6. Expectation-Maximization (EM) algorithm [Celeux and Govaert, 1992], with default parameter settings in Weka 3.6 [Witten and Frank, 2005] |
| Experiment Setup | Yes | SARSA uses: α = 0.05, ϵ = 0.1, and γ = 1. Q-learning uses: α = 1 10 32, ϵ = 0.1, and γ = 0.9. The parameter Φ determines when the agent listens to prior knowledge. Φ is multiplied by a decay factor, ΦD, on every time step. Among {0.9, 0.99, 0.999, 0.9999}, preliminary experiments found ΦD = 0.999 to be the best for Keepaway and ΦD = 0.9999 to be the best for Mario (explored further in Section 5.2). The confidence threshold of neural network is 0.6 while that of the decision tree is 0.85. |