Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Repeated Inverse Reinforcement Learning
Authors: Kareem Amin, Nan Jiang, Satinder Singh
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results. In summary form, our contributions include: (1) an efficient reward-identification algorithm when the agent can choose the tasks in which it observes human behavior; (2) an upper bound on the number of total surprises when no assumptions are made on the tasks, along with a corresponding lower bound; (3) an extension to the setting where the human provides sample trajectories instead of complete behavior; and (4) identification guarantees when the agent can only choose the task rewards but is given a fixed task environment. Theorem 2. For Θ0 = [ 1, 1]d, the number of mistakes made by Algorithm 1 is guaranteed to be O(d2 log(d/ϵ)). |
| Researcher Affiliation | Collaboration | Kareem Amin Google Research New York, NY 10011 EMAIL Nan Jiang Satinder Singh Computer Science & Engineering, University of Michigan, Ann Arbor, MI 48104 EMAIL |
| Pseudocode | Yes | Algorithm 1 Ellipsoid Algorithm for Repeated Inverse Reinforcement Learning; Algorithm 2 Trajectory version of Algorithm 1 for MDPs |
| Open Source Code | No | The paper does not contain any statements about releasing code or links to source code repositories for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using specific datasets, thus no information on publicly available datasets for training is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments or dataset usage, therefore no specific dataset split information for validation is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software names with version numbers or dependencies for replication. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical formulations and algorithms, therefore it does not contain details about experimental setup or hyperparameters. |