reproducibilityindex.ai

Task-agnostic Exploration in Reinforcement Learning

Authors: Xuezhou Zhang, Yuzhe Ma, Adish Singla

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present an efficient task-agnostic RL algorithm, UCBZERO, that finds ε-optimal policies for N arbitrary tasks after at most O(log(N)H5SA/ε2) exploration episodes, where H is the episode length, S is the state space size, and A is the action space size. We also provide an Ω(log(N)H2SA/ε2) lower bound, showing that the log dependency on N is unavoidable.
Researcher Affiliation	Academia	Xuezhou Zhang UW-Madison xzhang784@wisc.edu Yuzhe Ma UW-Madison ma234@wisc.edu Adish Singla MPI-SWS adishs@mpi-sws.org
Pseudocode	Yes	Algorithm 1 UCBZERO
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	No	The paper is theoretical and does not use any datasets for experiments, nor does it provide access information for any dataset.
Dataset Splits	No	The paper is theoretical and does not describe any experimental setup involving dataset splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to conduct research or simulations.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings, as it is a theoretical work.