Efficient Planning in Large MDPs with Weak Linear Function Approximation
Authors: Roshan Shariff, Csaba Szepesvari
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We design a randomized algorithm that positively answers the challenge posed above under one extra assumption that the feature vectors of all states lie within the convex hull of the feature vectors of a few selected core states that the algorithm is given. In particular, we show that the query-complexity and runtime of our algorithm is polynomial in the relevant quantities and the number of core states, providing a partial positive answer to the previously open problem of efficient planning in the presence of weak features. |
| Researcher Affiliation | Collaboration | Roshan Shariff University of Alberta & Amii roshan.shariff@ualberta.ca Csaba Szepesvári Deep Mind & University of Alberta & Amii szepesva@ualberta.ca |
| Pseudocode | Yes | Algorithm 1 Core Sto MP: Stochastic Mirror-Prox for Planning with Core States |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology. |
| Open Datasets | No | This is a theoretical paper presenting an algorithm and its analysis; it does not describe experiments with datasets. |
| Dataset Splits | No | This is a theoretical paper presenting an algorithm and its analysis; it does not describe experiments with datasets, and thus no dataset splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not report on empirical experiments, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper describes a theoretical algorithm (Core Sto MP) and its mathematical properties but does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not report empirical experiments; therefore, no experimental setup details, such as hyperparameters or training settings, are provided. |