Uncertainty-Aware Reward-Free Exploration with General Function Approximation

Authors: Junkai Zhang, Weitong Zhang, Dongruo Zhou, Quanquan Gu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further implement and evaluate GFA-RFE across various domains and tasks in the Deep Mind Control Suite. Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Los Angeles, California, USA 2Department of Computer Science, Indiana University Bloomington, Indiana, USA.
Pseudocode Yes Algorithm 1 GFA-RFE
Open Source Code Yes Our implementation can be accessed at Git Hub via https: //github.com/uclaml/GFA-RFE.
Open Datasets Yes We conduct our experiments on Unsupervised Reinforcement Learning Benchmarks (Laskin et al., 2021), which consists of two multi-tasks environments (Walker, Quadruped) from Deep Mind Control Suite (Tunyasuvunakool et al., 2020).
Dataset Splits No The paper describes an 'exploration phase' where K episodes are collected and a 'planning phase' for learning a policy, but it does not specify explicit train/validation/test dataset splits by percentage or count for evaluation.
Hardware Specification No The paper provides hyper-parameters in Table 3 but does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Optimizer Adam' and 'DDPG' but does not provide specific version numbers for any software dependencies or libraries used for implementation.
Experiment Setup Yes Table 3. The common set of hyper-parameters. Common hyper-parameter Value Replay buffer capacity 10^6 Action repeat 1 n-step returns 3 Mini-batch size 1024 Discount (γ) 0.99 Optimizer Adam Learning rate 10^-4 Agent update frequency 2 Critic target EMA rate (τQ) 0.01 Features dim. 50 Hidden dim. 1024 Exploration stddev clip 0.3 Exploration stddev value 0.2 Number of frames per episode 1 10^3 Number of online exploration frames up to 1 10^6 Number of offline planning frames 1 10^5 Critic network (|O| + |A|) 1024 Layer Norm Tanh 1024 RELU 1 Actor network |O| 50 Layer Norm Tanh 1024 RELU action dim