Adaptive Estimation Q-learning with Uncertainty and Familiarity
Authors: Xiaoyu Gong, Shuai Lü, Jiayu Yu, Sheng Zhu, Zongze Li
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate AEQ on several continuous control tasks, outperforming stateof-the-art performance. |
| Researcher Affiliation | Academia | 1Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, China 2College of Computer Science and Technology, Jilin University, China 3College of Software, Jilin University, China lus@jlu.edu.cn, {gongxy20, yujy19, zhusheng20, zzli20}@mails.jlu.edu.cn |
| Pseudocode | Yes | Algorithm 1 AEQ-TD3 |
| Open Source Code | Yes | To ensure that our results are convincing and reproducible, we will open-source the code. Implementations and appendix are available at: https://github.com/gxywy/AEQ |
| Open Datasets | Yes | We evaluate our method on a range of Mu Jo Co [Todorov et al., 2012] continuous control tasks from Open AI Gym [Brockman et al., 2016]. |
| Dataset Splits | No | The paper uses continuous control tasks from Open AI Gym and MuJoCo, which are interactive environments. It specifies total timesteps for training (e.g., "2 x 10^6 time steps") and number of random seeds ("5 random seeds"), but not explicit train/validation/test dataset splits in the traditional sense of partitioning a fixed dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using TD3, SAC, Open AI Gym, MuJoCo, and rl-plotter, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For AEQ-TD3, we use N = 2 and N = 10 critics with three hidden layers, βb = 0.5, βs = 0.5 for every tasks, and UTD = 1 for fair comparison. For AEQ-SAC, we use N = 10 and UTD = 20 to compare with REDQ [Chen et al., 2021]. |