reproducibilityindex.ai

Policy Shaping with Human Teachers

Authors: Thomas Cederborg, Ishaan Grover, Charles L Isbell, Andrea L Thomaz

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we evaluate the performance of a policy shaping algorithm using 26 human teachers. We examine if the algorithm is suitable for human-generated data on two different boards in a pac-man domain, comparing performance to an oracle that provides critique based on one known winning policy.
Researcher Affiliation	Academia	The paper does not explicitly state the institutional affiliations or email domains for the authors. The authors are listed as Thomas Cederborg, Ishaan Grover, Charles L Isbell and Andrea L Thomaz, and the paper is part of 'Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)', which is typically an academic conference.
Pseudocode	No	The paper describes the policy shaping and Q-learning algorithms in narrative text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials.
Open Datasets	No	The paper describes using a 'pac-man' experimental domain where human teachers generate critique data, but it does not mention the use of a publicly available dataset with concrete access information (link, DOI, or formal citation).
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits or percentages; it discusses learning episodes in terms of games played rather than data partitioning.
Hardware Specification	No	The paper does not provide specific details regarding the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions algorithms like 'Q-learning' and 'Boltzmann exploration' and specific parameter values (T = 1.5, α = 0.05, γ = 0.9) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	In our experiments, parameters were tuned using only Qlearning performance, without teacher critique data, and the values used were T = 1.5, α = 0.05 and γ = 0.9.