reproducibilityindex.ai

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework

Authors: Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li10859-10867

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that our exploration algorithm is effective and sample efﬁcient, and results in superior policies for arbitrary reward functions in the planning phase. We conduct experiments on several environments with discrete, continuous or high-dimensional state spaces.
Researcher Affiliation	Academia	Chuheng Zhang , Yuanying Cai , Longbo Huang, Jian Li Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
Pseudocode	Yes	Algorithm 1 Maximizing the state-action space R enyi entropy for the reward-free RL framework
Open Source Code	No	The paper does not explicitly state that the source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	We first conduct experiments on the Multi Rooms environment from minigrid (Chevalier-Boisvert, Willems, and Pal 2018)... Then, we conduct experiments on a set of Atari (with image-based observations) (Machado et al. 2018) and Mujoco (Todorov, Erez, and Tassa 2012) tasks
Dataset Splits	No	The paper describes collecting samples and datasets (e.g., 'collect a dataset with 100M (5M) samples'), but it does not specify explicit numerical training/validation/test dataset splits (e.g., '80/10/10 split' or exact sample counts for each split) in the main text for reproducibility of data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library names with specific versions, or solver names with versions) needed to replicate the experiments.
Experiment Setup	Yes	In the exploration phase, we run different exploration algorithms in the rewardfree environment of Atari (Mujoco) for 200M (10M) steps and collect a dataset with 100M (5M) samples by executing the learned policy. More experiments and the detailed experiment settings/hyperparameters can be found in Appendix G.