Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints

Authors: Sebastian Tschiatschek, Ahana Ghosh, Luis Haug, Rati Devidze, Adish Singla

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design learner-aware teaching algorithms and show that significant performance improvements can be achieved over learner-agnostic teaching. ... IV We empirically show that significant performance improvements can be achieved by learneraware teachers as compared to learner-agnostic teachers (Section 6). ... In this section we evaluate our teaching algorithms for different types of learners on the environment introduced in Figure 1.
Researcher Affiliation Collaboration Sebastian Tschiatschek Microsoft Research setschia@microsoft.com Ahana Ghosh MPI-SWS gahana@mpi-sws.org Luis Haug ETH Zurich lhaug@inf.ethz.ch Rati Devidze MPI-SWS rdevidze@mpi-sws.org Adish Singla MPI-SWS adishs@mpi-sws.org
Pseudocode Yes Algorithm 1 Teacher-learner interaction in the adaptive teaching setting
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets No The environment we consider here has three types of reward objects, i.e., a star" object with reward of 1.0, a plus" object with reward of 0.9, and a dot" object with reward of 0.2. Two objects of each type are placed randomly on the grid such that there is always only a single object in each grid cell. ... The paper describes a custom environment setup and does not refer to a publicly available dataset with concrete access information.
Dataset Splits No The paper describes the experimental setup and evaluation but does not specify train/validation/test dataset splits with percentages or counts for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory.
Software Dependencies No The paper mentions general algorithms and frameworks (e.g., 'MCE-IRL framework', 'Frank-Wolfe algorithm', 'Soft-Value-Iteration procedure') but does not specify any software names with version numbers for reproducibility.
Experiment Setup Yes We use a discount factor of γ = 0.99. ... For the learner models in Section 3, the optimal learner-aware teaching problem can be naturally formalized as the following bi-level optimization problem ... In this section we consider learners with soft constraints from Section 3.2, with preference features as described above, and parameters Cr = 5, Cc = 10, and δhard c = 0 (more experimental results for different values of Cr and Cc are provided in Appendix B.1 of the supplementary). ... To implement the learner in Eq. 2, we approximated the learner s projection onto the set ΩL r as follows: We implemented the learner based on the optimization problem given in Eq. 3 with a hard constraint on preferences and L2 norm penalty on reward mismatch scaled with a large value of Cr = 20.