reproducibilityindex.ai

Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs

Authors: Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Basar

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments validate the advantages of Restart Q-UCB in terms of both cumulative rewards and computational efﬁciency. We conduct simulations showing that Restart Q-UCB achieves highly competitive cumulative rewards against a state-of-the-art solution (Zhou et al., 2020), while only taking 0.18% of its computation time; In this section, we empirically evaluate Restart Q-UCB on reinforcement learning tasks with various types of non-stationarity.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, IL, USA 2Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, MA, USA.
Pseudocode	Yes	Algorithm 1: Restart Q-UCB (Hoeffding)
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We evaluate the cumulative rewards of the four algorithms on a variant of a reinforcement learning task named Bidirectional Diabolical Combination Lock (Agarwal et al., 2020; Misra et al., 2020).
Dataset Splits	No	The paper mentions evaluating algorithms on a task and averaging results over runs, but it does not specify any training, validation, or test dataset splits.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not list any specific software components with version numbers (e.g., programming languages, libraries, frameworks) used in the experiments.
Experiment Setup	No	A detailed discussion on the task settings as well as the conﬁguration of the hyper-parameters is deferred to Appendix I.