Active Advice Seeking for Inverse Reinforcement Learning

Authors: Phillip Odom, Sriraam Natarajan

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Table 1: Results We show the percentage of games won (reached G) for each method while varying the number of advice solicited. ... Our method outperforms both baselines when it can ask for sufficient advice to learn its policy.
Researcher Affiliation Academia Phillip Odom and Sriraam Natarajan School of Informatics and Computing Indiana University Bloomington phodom,natarasr@indiana.edu
Pseudocode Yes Algorithm 1 Active Advice Seeking IRL Algorithm Require: Demonstrations ( ), Maximum Advice (M) Require: N number advice to ask for at once Require: Expert(S) which returns advice for si S Advice = while |Advice| < M do Reward = AIRL( , Advice) Uncertainty(x) = u(x) S = Highest Uncertainty(Uncertainty, N) A = Expert(S) Advice = Advice A end while return AIRL( , Advice)
Open Source Code No The paper does not provide any explicit statements about open-sourcing code or links to a code repository for the described methodology.
Open Datasets No The paper mentions using 'Wumpus World' for initial results but does not provide concrete access information (link, DOI, repository, or formal citation with authors/year) for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup No The paper describes the general approach but does not provide specific experimental setup details such as concrete hyperparameter values or detailed training configurations.