Repeated Inverse Reinforcement Learning

Authors: Kareem Amin, Nan Jiang, Satinder Singh

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results. In summary form, our contributions include: (1) an efficient reward-identification algorithm when the agent can choose the tasks in which it observes human behavior; (2) an upper bound on the number of total surprises when no assumptions are made on the tasks, along with a corresponding lower bound; (3) an extension to the setting where the human provides sample trajectories instead of complete behavior; and (4) identification guarantees when the agent can only choose the task rewards but is given a fixed task environment. Theorem 2. For Θ0 = [ 1, 1]d, the number of mistakes made by Algorithm 1 is guaranteed to be O(d2 log(d/ϵ)).
Researcher Affiliation Collaboration Kareem Amin Google Research New York, NY 10011 kamin@google.com Nan Jiang Satinder Singh Computer Science & Engineering, University of Michigan, Ann Arbor, MI 48104 {nanjiang,baveja}@umich.edu
Pseudocode Yes Algorithm 1 Ellipsoid Algorithm for Repeated Inverse Reinforcement Learning; Algorithm 2 Trajectory version of Algorithm 1 for MDPs
Open Source Code No The paper does not contain any statements about releasing code or links to source code repositories for the described methodology.
Open Datasets No The paper is theoretical and does not describe experiments using specific datasets, thus no information on publicly available datasets for training is provided.
Dataset Splits No The paper is theoretical and does not describe empirical experiments or dataset usage, therefore no specific dataset split information for validation is provided.
Hardware Specification No The paper is theoretical and does not describe any computational experiments that would require hardware specifications.
Software Dependencies No The paper is theoretical and does not mention specific software names with version numbers or dependencies for replication.
Experiment Setup No The paper is theoretical and focuses on mathematical formulations and algorithms, therefore it does not contain details about experimental setup or hyperparameters.