Task-agnostic Exploration in Reinforcement Learning
Authors: Xuezhou Zhang, Yuzhe Ma, Adish Singla
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present an efficient task-agnostic RL algorithm, UCBZERO, that finds ε-optimal policies for N arbitrary tasks after at most O(log(N)H5SA/ε2) exploration episodes, where H is the episode length, S is the state space size, and A is the action space size. We also provide an Ω(log(N)H2SA/ε2) lower bound, showing that the log dependency on N is unavoidable. |
| Researcher Affiliation | Academia | Xuezhou Zhang UW-Madison xzhang784@wisc.edu Yuzhe Ma UW-Madison ma234@wisc.edu Adish Singla MPI-SWS adishs@mpi-sws.org |
| Pseudocode | Yes | Algorithm 1 UCBZERO |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not use any datasets for experiments, nor does it provide access information for any dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe any experimental setup involving dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to conduct research or simulations. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings, as it is a theoretical work. |