Policy Shaping with Human Teachers

Authors: Thomas Cederborg, Ishaan Grover, Charles L Isbell, Andrea L Thomaz

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we evaluate the performance of a policy shaping algorithm using 26 human teachers. We examine if the algorithm is suitable for human-generated data on two different boards in a pac-man domain, comparing performance to an oracle that provides critique based on one known winning policy.
Researcher Affiliation Academia The paper does not explicitly state the institutional affiliations or email domains for the authors. The authors are listed as Thomas Cederborg, Ishaan Grover, Charles L Isbell and Andrea L Thomaz, and the paper is part of 'Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)', which is typically an academic conference.
Pseudocode No The paper describes the policy shaping and Q-learning algorithms in narrative text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials.
Open Datasets No The paper describes using a 'pac-man' experimental domain where human teachers generate critique data, but it does not mention the use of a publicly available dataset with concrete access information (link, DOI, or formal citation).
Dataset Splits No The paper does not provide specific training/validation/test dataset splits or percentages; it discusses learning episodes in terms of games played rather than data partitioning.
Hardware Specification No The paper does not provide specific details regarding the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions algorithms like 'Q-learning' and 'Boltzmann exploration' and specific parameter values (T = 1.5, α = 0.05, γ = 0.9) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes In our experiments, parameters were tuned using only Qlearning performance, without teacher critique data, and the values used were T = 1.5, α = 0.05 and γ = 0.9.