reproducibilityindex.ai

Active Advice Seeking for Inverse Reinforcement Learning

Authors: Phillip Odom, Sriraam Natarajan

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Table 1: Results We show the percentage of games won (reached G) for each method while varying the number of advice solicited. ... Our method outperforms both baselines when it can ask for sufﬁcient advice to learn its policy.
Researcher Affiliation	Academia	Phillip Odom and Sriraam Natarajan School of Informatics and Computing Indiana University Bloomington phodom,natarasr@indiana.edu
Pseudocode	Yes	Algorithm 1 Active Advice Seeking IRL Algorithm Require: Demonstrations ( ), Maximum Advice (M) Require: N number advice to ask for at once Require: Expert(S) which returns advice for si S Advice = while \|Advice\| < M do Reward = AIRL( , Advice) Uncertainty(x) = u(x) S = Highest Uncertainty(Uncertainty, N) A = Expert(S) Advice = Advice A end while return AIRL( , Advice)
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing code or links to a code repository for the described methodology.
Open Datasets	No	The paper mentions using 'Wumpus World' for initial results but does not provide concrete access information (link, DOI, repository, or formal citation with authors/year) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	No	The paper describes the general approach but does not provide specific experimental setup details such as concrete hyperparameter values or detailed training configurations.