reproducibilityindex.ai

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

Authors: Dingwen Kong, Lin Yang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks.
Researcher Affiliation	Academia	Dingwen Kong School of Mathematical Sciences Peking University dingwenk@pku.edu.cn Lin F. Yang Department of Electrical and Computer Engineering University of California, Los Angeles linyang@ee.ucla.edu
Pseudocode	Yes	Algorithm 1 Active Reward Learning(Z, , δ)
Open Source Code	Yes	The source code is included in the supplementary material. One may run Figure1.m and Figure2.m to reproduce the results in Figure 1 and Figure 2, respectively.
Open Datasets	No	We consider a tabular MDP with linear reward. The details of the experiments are deferred to Appendix A. Here we highlight three main points derived from the experiment.
Dataset Splits	No	We train for K = 2000 episodes for the first phase, and run 100 trials.
Hardware Specification	No	The amount of compute is negligible since the environment is very small. Our results can be easily reproduced in a personal laptop.
Software Dependencies	No	The source code is written in MATLAB.
Experiment Setup	Yes	The dimension of the linear MDP is d = 2. The horizon H = 5. The action space has \|A\| = 2 actions. The feature map ϕ(s, a) is defined as ϕ(s, a) = [s, a] where s is the state and a is the action. For the tabular case, we set the number of states S = 100.