Uncertainty-Aware Reward-Free Exploration with General Function Approximation
Authors: Junkai Zhang, Weitong Zhang, Dongruo Zhou, Quanquan Gu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further implement and evaluate GFA-RFE across various domains and tasks in the Deep Mind Control Suite. Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Los Angeles, California, USA 2Department of Computer Science, Indiana University Bloomington, Indiana, USA. |
| Pseudocode | Yes | Algorithm 1 GFA-RFE |
| Open Source Code | Yes | Our implementation can be accessed at Git Hub via https: //github.com/uclaml/GFA-RFE. |
| Open Datasets | Yes | We conduct our experiments on Unsupervised Reinforcement Learning Benchmarks (Laskin et al., 2021), which consists of two multi-tasks environments (Walker, Quadruped) from Deep Mind Control Suite (Tunyasuvunakool et al., 2020). |
| Dataset Splits | No | The paper describes an 'exploration phase' where K episodes are collected and a 'planning phase' for learning a policy, but it does not specify explicit train/validation/test dataset splits by percentage or count for evaluation. |
| Hardware Specification | No | The paper provides hyper-parameters in Table 3 but does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Optimizer Adam' and 'DDPG' but does not provide specific version numbers for any software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | Table 3. The common set of hyper-parameters. Common hyper-parameter Value Replay buffer capacity 10^6 Action repeat 1 n-step returns 3 Mini-batch size 1024 Discount (γ) 0.99 Optimizer Adam Learning rate 10^-4 Agent update frequency 2 Critic target EMA rate (τQ) 0.01 Features dim. 50 Hidden dim. 1024 Exploration stddev clip 0.3 Exploration stddev value 0.2 Number of frames per episode 1 10^3 Number of online exploration frames up to 1 10^6 Number of offline planning frames 1 10^5 Critic network (|O| + |A|) 1024 Layer Norm Tanh 1024 RELU 1 Actor network |O| 50 Layer Norm Tanh 1024 RELU action dim |