Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration
Authors: Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation... This works makes two contributions. It presents a statistically and computationally efficient online PAC algorithm... Before presenting the main result is useful to define the average feature φ ,t = Ext φt(xt, t(xt)) encountered at timestep t upon following a certain policy . In addition, we need a way to measure how explorable the space is... Theorem 4.1. |
| Researcher Affiliation | Collaboration | Andrea Zanette Stanford University zanette@stanford.edu Alessandro Lazaric Facebook Artificial Intelligence Research lazaric@fb.com Mykel J. Kochenderfer Stanford University mykel@stanford.edu Emma Brunskill Stanford University ebrun@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Forward Reward Agnostic Navigation with Confidence by Injecting Stochasticity (FRANCIS) |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not use a concrete dataset; therefore, no information about public availability of training data is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments, so no information on dataset splits (train/validation/test) is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or hardware used. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper describes a theoretical algorithm and provides proofs, but does not detail an experimental setup with specific hyperparameters or training configurations. |