reproducibilityindex.ai

Lifelong Inverse Reinforcement Learning

Authors: Jorge Mendez, Shashank Shivkumar, Eric Eaton

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated ELIRL on two environments, chosen to allow us to create arbitrarily many tasks with distinct reward functions. This also gives us known rewards as ground truth. No previous multi-task IRL method was tested on such a large task set, nor on tasks with varying state spaces as we do.
Researcher Affiliation	Academia	Jorge A. Mendez, Shashank Shivkumar, and Eric Eaton Department of Computer and Information Science University of Pennsylvania {mendezme,shashs,eeaton}@seas.upenn.edu
Pseudocode	Yes	Algorithm 1 ELIRL (k, λ, µ)
Open Source Code	No	The paper mentions 'BURLAP Java library, version 3.0' [24] as a third-party tool but does not provide any statement or link indicating that the source code for the proposed ELIRL method itself is open-source or publicly available.
Open Datasets	No	The paper describes how it generated data using 'Objectworld' and 'Highway' simulations: 'We solved the MDP for the true optimal policy, and generated simulated user trajectories following this policy.' It does not refer to using a publicly available, pre-existing dataset with access information.
Dataset Splits	No	The paper specifies 'All learners were given nt = 32 trajectories for Objectworld and nt = 256 trajectories for Highway, all of length H = 16.' This specifies the amount of demonstration data provided for learning each task but does not describe conventional train/validation/test splits of a fixed dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used for running the experiments.
Software Dependencies	Yes	James Mac Glashan. Brown-UMBC reinforcement learning and planning (BURLAP) Java library, version 3.0. Available online at http://burlap.cs.brown.edu, 2016.
Experiment Setup	Yes	All learners were given nt = 32 trajectories for Objectworld and nt = 256 trajectories for Highway, all of length H = 16. ... The agent s chosen action has a 70% probability of success and a 30% probability of a random outcome. The reward is discounted with each time step by a factor of γ = 0.9.