Interactive Teaching Strategies for Agent Training
Authors: Ofra Amir, Ece Kamar, Andrey Kolobov, Barbara J. Grosz
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that these approaches reduce the amount of attention required from the teacher compared to teacher-initiated strategies, while maintaining similar learning gains. The empirical evaluation also investigates the effect of the information communicated to the teacher and the quality of the student s initial policy on teaching outcomes. |
| Researcher Affiliation | Collaboration | Ofra Amir Harvard University oamir@seas.harvard.edu Ece Kamar Microsoft Research eckamar@microsoft.com Andrey Kolobov Microsoft Research akolobov@microsoft.com Barbara J. Grosz Harvard University grosz@eecs.harvard.edu |
| Pseudocode | No | The paper describes algorithms in text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-sourcing the code for the described methodology. |
| Open Datasets | Yes | We used the Pac-Man vs. Ghosts League competition [Rohlfshagen and Lucas, 2011] as our experimental domain. |
| Dataset Splits | No | The paper describes 'evaluating the student s policy at that time point by averaging 30 evaluation episodes', but does not explicitly detail training, validation, and test dataset splits with percentages or counts, or reference predefined splits for reproducibility. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments. |
| Software Dependencies | No | The paper describes the Sarsa(λ) algorithm and its parameters but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | We used the same parameter configuration as Torrey & Taylor [2013]: = 0.05, = 0.001, γ = 0.999, λ = 0.9. The student agent employed the Sarsa(λ) algorithm to learn the weights in Equation 4. We used a high-level feature representation for state-action pairs. Specifically, we use the 7-feature representation from Torrey & Taylor s [2013] implementation. |