reproducibilityindex.ai

Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints

Authors: Sebastian Tschiatschek, Ahana Ghosh, Luis Haug, Rati Devidze, Adish Singla

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design learner-aware teaching algorithms and show that signiﬁcant performance improvements can be achieved over learner-agnostic teaching. ... IV We empirically show that signiﬁcant performance improvements can be achieved by learneraware teachers as compared to learner-agnostic teachers (Section 6). ... In this section we evaluate our teaching algorithms for different types of learners on the environment introduced in Figure 1.
Researcher Affiliation	Collaboration	Sebastian Tschiatschek Microsoft Research setschia@microsoft.com Ahana Ghosh MPI-SWS gahana@mpi-sws.org Luis Haug ETH Zurich lhaug@inf.ethz.ch Rati Devidze MPI-SWS rdevidze@mpi-sws.org Adish Singla MPI-SWS adishs@mpi-sws.org
Pseudocode	Yes	Algorithm 1 Teacher-learner interaction in the adaptive teaching setting
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	No	The environment we consider here has three types of reward objects, i.e., a star" object with reward of 1.0, a plus" object with reward of 0.9, and a dot" object with reward of 0.2. Two objects of each type are placed randomly on the grid such that there is always only a single object in each grid cell. ... The paper describes a custom environment setup and does not refer to a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes the experimental setup and evaluation but does not specify train/validation/test dataset splits with percentages or counts for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory.
Software Dependencies	No	The paper mentions general algorithms and frameworks (e.g., 'MCE-IRL framework', 'Frank-Wolfe algorithm', 'Soft-Value-Iteration procedure') but does not specify any software names with version numbers for reproducibility.
Experiment Setup	Yes	We use a discount factor of γ = 0.99. ... For the learner models in Section 3, the optimal learner-aware teaching problem can be naturally formalized as the following bi-level optimization problem ... In this section we consider learners with soft constraints from Section 3.2, with preference features as described above, and parameters Cr = 5, Cc = 10, and δhard c = 0 (more experimental results for different values of Cr and Cc are provided in Appendix B.1 of the supplementary). ... To implement the learner in Eq. 2, we approximated the learner s projection onto the set ΩL r as follows: We implemented the learner based on the optimization problem given in Eq. 3 with a hard constraint on preferences and L2 norm penalty on reward mismatch scaled with a large value of Cr = 20.