reproducibilityindex.ai

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

Authors: Daniel S. Brown, Scott Niekum7749-7758

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our proposed machine teaching algorithm to two novel applications: providing a lower bound on the number of queries needed to learn a policy using active IRL and developing a novel IRL algorithm that can learn more efﬁciently from informative demonstrations than a standard IRL approach. and Table 1: Comparison of Uncertainty Volume Minimization (UVM) and Set Cover Optimal Teaching (SCOT) averaged across 20 random 9x9 grid worlds with 8-dimensional features. UVM(x) was run using x Monte Carlo samples. UVM underestimates the number of (s, a) pairs needed to teach π .
Researcher Affiliation	Academia	Daniel S. Brown, Scott Niekum Department of Computer Science University of Texas at Austin {dsbrown,sniekum}@cs.utexas.edu
Pseudocode	Yes	Algorithm 1 Set Cover Optimal Teaching (SCOT)
Open Source Code	No	The paper refers to an arXiv preprint of a full paper, but does not provide any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	No	The experiments were conducted on self-generated 'random 9x9 grid worlds' and 'random 10x10 grid worlds' with varying features, and a 'ball sorting task' environment, which are custom simulated environments rather than publicly available datasets with concrete access information (links, DOIs, or formal citations).
Dataset Splits	No	The paper describes experiments in simulated environments and the generation of demonstrations (e.g., 'generated demonstrations from 50 random rewards'), but it does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations) needed to reproduce data partitioning.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details such as library or solver names with version numbers needed to replicate the experiment.
Experiment Setup	No	The paper mentions λ as a hyperparameter (e.g., 'where λ ≥ 0 is a hyperparameter modeling the conﬁdence that the demonstrations are informative') but does not provide specific values used in the experiments. No other concrete hyperparameter values, training configurations, or system-level settings are detailed.