Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints
Authors: Sebastian Tschiatschek, Ahana Ghosh, Luis Haug, Rati Devidze, Adish Singla
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design learner-aware teaching algorithms and show that significant performance improvements can be achieved over learner-agnostic teaching. ... IV We empirically show that significant performance improvements can be achieved by learneraware teachers as compared to learner-agnostic teachers (Section 6). ... In this section we evaluate our teaching algorithms for different types of learners on the environment introduced in Figure 1. |
| Researcher Affiliation | Collaboration | Sebastian Tschiatschek Microsoft Research setschia@microsoft.com Ahana Ghosh MPI-SWS gahana@mpi-sws.org Luis Haug ETH Zurich lhaug@inf.ethz.ch Rati Devidze MPI-SWS rdevidze@mpi-sws.org Adish Singla MPI-SWS adishs@mpi-sws.org |
| Pseudocode | Yes | Algorithm 1 Teacher-learner interaction in the adaptive teaching setting |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | No | The environment we consider here has three types of reward objects, i.e., a star" object with reward of 1.0, a plus" object with reward of 0.9, and a dot" object with reward of 0.2. Two objects of each type are placed randomly on the grid such that there is always only a single object in each grid cell. ... The paper describes a custom environment setup and does not refer to a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes the experimental setup and evaluation but does not specify train/validation/test dataset splits with percentages or counts for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions general algorithms and frameworks (e.g., 'MCE-IRL framework', 'Frank-Wolfe algorithm', 'Soft-Value-Iteration procedure') but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We use a discount factor of γ = 0.99. ... For the learner models in Section 3, the optimal learner-aware teaching problem can be naturally formalized as the following bi-level optimization problem ... In this section we consider learners with soft constraints from Section 3.2, with preference features as described above, and parameters Cr = 5, Cc = 10, and δhard c = 0 (more experimental results for different values of Cr and Cc are provided in Appendix B.1 of the supplementary). ... To implement the learner in Eq. 2, we approximated the learner s projection onto the set ΩL r as follows: We implemented the learner based on the optimization problem given in Eq. 3 with a hard constraint on preferences and L2 norm penalty on reward mismatch scaled with a large value of Cr = 20. |