Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Authors: Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation... This works makes two contributions. It presents a statistically and computationally efficient online PAC algorithm... Before presenting the main result is useful to define the average feature φ ,t = Ext φt(xt, t(xt)) encountered at timestep t upon following a certain policy . In addition, we need a way to measure how explorable the space is... Theorem 4.1.
Researcher Affiliation Collaboration Andrea Zanette Stanford University zanette@stanford.edu Alessandro Lazaric Facebook Artificial Intelligence Research lazaric@fb.com Mykel J. Kochenderfer Stanford University mykel@stanford.edu Emma Brunskill Stanford University ebrun@cs.stanford.edu
Pseudocode Yes Algorithm 1 Forward Reward Agnostic Navigation with Confidence by Injecting Stochasticity (FRANCIS)
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper is theoretical and does not use a concrete dataset; therefore, no information about public availability of training data is provided.
Dataset Splits No The paper is theoretical and does not describe empirical experiments, so no information on dataset splits (train/validation/test) is provided.
Hardware Specification No The paper is theoretical and does not describe any experimental setup or hardware used.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup No The paper describes a theoretical algorithm and provides proofs, but does not detail an experimental setup with specific hyperparameters or training configurations.