Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

Authors: Daniel S. Brown, Scott Niekum7749-7758

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our proposed machine teaching algorithm to two novel applications: providing a lower bound on the number of queries needed to learn a policy using active IRL and developing a novel IRL algorithm that can learn more efficiently from informative demonstrations than a standard IRL approach. and Table 1: Comparison of Uncertainty Volume Minimization (UVM) and Set Cover Optimal Teaching (SCOT) averaged across 20 random 9x9 grid worlds with 8-dimensional features. UVM(x) was run using x Monte Carlo samples. UVM underestimates the number of (s, a) pairs needed to teach π .
Researcher Affiliation Academia Daniel S. Brown, Scott Niekum Department of Computer Science University of Texas at Austin {dsbrown,sniekum}@cs.utexas.edu
Pseudocode Yes Algorithm 1 Set Cover Optimal Teaching (SCOT)
Open Source Code No The paper refers to an arXiv preprint of a full paper, but does not provide any explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets No The experiments were conducted on self-generated 'random 9x9 grid worlds' and 'random 10x10 grid worlds' with varying features, and a 'ball sorting task' environment, which are custom simulated environments rather than publicly available datasets with concrete access information (links, DOIs, or formal citations).
Dataset Splits No The paper describes experiments in simulated environments and the generation of demonstrations (e.g., 'generated demonstrations from 50 random rewards'), but it does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations) needed to reproduce data partitioning.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details such as library or solver names with version numbers needed to replicate the experiment.
Experiment Setup No The paper mentions λ as a hyperparameter (e.g., 'where λ ≥ 0 is a hyperparameter modeling the confidence that the demonstrations are informative') but does not provide specific values used in the experiments. No other concrete hyperparameter values, training configurations, or system-level settings are detailed.