reproducibilityindex.ai

Repeated Inverse Reinforcement Learning

Authors: Kareem Amin, Nan Jiang, Satinder Singh

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results. In summary form, our contributions include: (1) an efﬁcient reward-identiﬁcation algorithm when the agent can choose the tasks in which it observes human behavior; (2) an upper bound on the number of total surprises when no assumptions are made on the tasks, along with a corresponding lower bound; (3) an extension to the setting where the human provides sample trajectories instead of complete behavior; and (4) identiﬁcation guarantees when the agent can only choose the task rewards but is given a ﬁxed task environment. Theorem 2. For Θ0 = [ 1, 1]d, the number of mistakes made by Algorithm 1 is guaranteed to be O(d2 log(d/ϵ)).
Researcher Affiliation	Collaboration	Kareem Amin Google Research New York, NY 10011 kamin@google.com Nan Jiang Satinder Singh Computer Science & Engineering, University of Michigan, Ann Arbor, MI 48104 {nanjiang,baveja}@umich.edu
Pseudocode	Yes	Algorithm 1 Ellipsoid Algorithm for Repeated Inverse Reinforcement Learning; Algorithm 2 Trajectory version of Algorithm 1 for MDPs
Open Source Code	No	The paper does not contain any statements about releasing code or links to source code repositories for the described methodology.
Open Datasets	No	The paper is theoretical and does not describe experiments using specific datasets, thus no information on publicly available datasets for training is provided.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments or dataset usage, therefore no specific dataset split information for validation is provided.
Hardware Specification	No	The paper is theoretical and does not describe any computational experiments that would require hardware specifications.
Software Dependencies	No	The paper is theoretical and does not mention specific software names with version numbers or dependencies for replication.
Experiment Setup	No	The paper is theoretical and focuses on mathematical formulations and algorithms, therefore it does not contain details about experimental setup or hyperparameters.