Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration

Authors: Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation... This works makes two contributions. It presents a statistically and computationally efficient online PAC algorithm... Before presenting the main result is useful to define the average feature φ ,t = Ext φt(xt, t(xt)) encountered at timestep t upon following a certain policy . In addition, we need a way to measure how explorable the space is... Theorem 4.1.
Researcher Affiliation Collaboration Andrea Zanette Stanford University EMAIL Alessandro Lazaric Facebook Artificial Intelligence Research EMAIL Mykel J. Kochenderfer Stanford University EMAIL Emma Brunskill Stanford University EMAIL
Pseudocode Yes Algorithm 1 Forward Reward Agnostic Navigation with Confidence by Injecting Stochasticity (FRANCIS)
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper is theoretical and does not use a concrete dataset; therefore, no information about public availability of training data is provided.
Dataset Splits No The paper is theoretical and does not describe empirical experiments, so no information on dataset splits (train/validation/test) is provided.
Hardware Specification No The paper is theoretical and does not describe any experimental setup or hardware used.
Software Dependencies No The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup No The paper describes a theoretical algorithm and provides proofs, but does not detail an experimental setup with specific hyperparameters or training configurations.