Repeated Inverse Reinforcement Learning
Authors: Kareem Amin, Nan Jiang, Satinder Singh
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results. In summary form, our contributions include: (1) an efficient reward-identification algorithm when the agent can choose the tasks in which it observes human behavior; (2) an upper bound on the number of total surprises when no assumptions are made on the tasks, along with a corresponding lower bound; (3) an extension to the setting where the human provides sample trajectories instead of complete behavior; and (4) identification guarantees when the agent can only choose the task rewards but is given a fixed task environment. Theorem 2. For Θ0 = [ 1, 1]d, the number of mistakes made by Algorithm 1 is guaranteed to be O(d2 log(d/ϵ)). |
| Researcher Affiliation | Collaboration | Kareem Amin Google Research New York, NY 10011 kamin@google.com Nan Jiang Satinder Singh Computer Science & Engineering, University of Michigan, Ann Arbor, MI 48104 {nanjiang,baveja}@umich.edu |
| Pseudocode | Yes | Algorithm 1 Ellipsoid Algorithm for Repeated Inverse Reinforcement Learning; Algorithm 2 Trajectory version of Algorithm 1 for MDPs |
| Open Source Code | No | The paper does not contain any statements about releasing code or links to source code repositories for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using specific datasets, thus no information on publicly available datasets for training is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or dataset usage, therefore no specific dataset split information for validation is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software names with version numbers or dependencies for replication. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical formulations and algorithms, therefore it does not contain details about experimental setup or hyperparameters. |