Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling

Authors: Adrian Šošić, Elmar Rueckert, Jan Peters, Abdelhak M. Zoubir, Heinz Koeppl

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experimental study, we compare the proposed approach with common baseline methods on a variety of benchmark tasks and real-world scenarios. The results reveal that our approach performs significantly better than the original BNIRL model and alternative IRL solutions on all considered tasks. Interestingly enough, our algorithm outperforms the baselines even when the expert s true reward structure is dense and the underlying subgoal assumption is violated.
Researcher Affiliation Academia Adrian ˇSoˇsi c EMAIL Abdelhak M. Zoubir EMAIL Signal Processing Group Technische Universit at Darmstadt 64283 Darmstadt, Germany Elmar Rueckert EMAIL Institute for Robotics and Cognitive Systems University of L ubeck 23538 L ubeck, Germany Jan Peters EMAIL Autonomous Systems Labs Technische Universit at Darmstadt 64289 Darmstadt, Germany Heinz Koeppl EMAIL Bioinspired Communication Systems Technische Universit at Darmstadt 64283 Darmstadt, Germany
Pseudocode No The paper describes methods like Gibbs sampling and conditional probability distributions, but does not present any explicit pseudocode blocks or algorithms with numbered steps.
Open Source Code No The paper states: "Videos of all demonstrated tasks can be found at http://www.spg.tu-darmstadt.de/jmlr2018." This link provides access to videos of tasks, not the source code for the methodology described in the paper.
Open Datasets No The paper mentions generating random MDPs, using a "BNIRL data set (Michini and How, 2012)" as a reference, and collecting data on a "KUKA lightweight robotic arm." However, it does not provide concrete access information (links, DOIs, repositories) for these datasets to be publicly available or open for replication.
Dataset Splits No The paper mentions generating "a number of expert trajectories of length 10" for the random MDP scenario and a "manual segmentation of all recorded trajectories" for the robot experiment. However, it does not specify any training/test/validation dataset splits (e.g., percentages, sample counts, or references to standard predefined splits) needed for reproducibility.
Hardware Specification No The paper mentions using a "KUKA lightweight robotic arm" for data collection in the robot experiment. However, it does not specify any hardware details (e.g., CPU, GPU models, memory, or specific computer specifications) used to run the computational experiments or train the models.
Software Dependencies No The paper does not provide specific software names with version numbers. It mentions concepts like "Gibbs chain" and "value iteration algorithm" but not the software used for their implementation with version details.
Experiment Setup No The paper discusses various model parameters (e.g., discount factor γ = 0.9, uncertainty coefficient β, self-link parameter ν, constant κ for the score function) but does not provide a comprehensive set of specific hyperparameters or system-level training settings in a clearly labeled section or table that would allow for full reproduction of experiments.