Policy Shaping with Human Teachers
Authors: Thomas Cederborg, Ishaan Grover, Charles L Isbell, Andrea L Thomaz
IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we evaluate the performance of a policy shaping algorithm using 26 human teachers. We examine if the algorithm is suitable for human-generated data on two different boards in a pac-man domain, comparing performance to an oracle that provides critique based on one known winning policy. |
| Researcher Affiliation | Academia | The paper does not explicitly state the institutional affiliations or email domains for the authors. The authors are listed as Thomas Cederborg, Ishaan Grover, Charles L Isbell and Andrea L Thomaz, and the paper is part of 'Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)', which is typically an academic conference. |
| Pseudocode | No | The paper describes the policy shaping and Q-learning algorithms in narrative text but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing source code, nor does it provide links to a code repository or mention code in supplementary materials. |
| Open Datasets | No | The paper describes using a 'pac-man' experimental domain where human teachers generate critique data, but it does not mention the use of a publicly available dataset with concrete access information (link, DOI, or formal citation). |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits or percentages; it discusses learning episodes in terms of games played rather than data partitioning. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions algorithms like 'Q-learning' and 'Boltzmann exploration' and specific parameter values (T = 1.5, α = 0.05, γ = 0.9) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | In our experiments, parameters were tuned using only Qlearning performance, without teacher critique data, and the values used were T = 1.5, α = 0.05 and γ = 0.9. |