Adaptive Estimation Q-learning with Uncertainty and Familiarity

Authors: Xiaoyu Gong, Shuai Lü, Jiayu Yu, Sheng Zhu, Zongze Li

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate AEQ on several continuous control tasks, outperforming stateof-the-art performance.
Researcher Affiliation Academia 1Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, China 2College of Computer Science and Technology, Jilin University, China 3College of Software, Jilin University, China lus@jlu.edu.cn, {gongxy20, yujy19, zhusheng20, zzli20}@mails.jlu.edu.cn
Pseudocode Yes Algorithm 1 AEQ-TD3
Open Source Code Yes To ensure that our results are convincing and reproducible, we will open-source the code. Implementations and appendix are available at: https://github.com/gxywy/AEQ
Open Datasets Yes We evaluate our method on a range of Mu Jo Co [Todorov et al., 2012] continuous control tasks from Open AI Gym [Brockman et al., 2016].
Dataset Splits No The paper uses continuous control tasks from Open AI Gym and MuJoCo, which are interactive environments. It specifies total timesteps for training (e.g., "2 x 10^6 time steps") and number of random seeds ("5 random seeds"), but not explicit train/validation/test dataset splits in the traditional sense of partitioning a fixed dataset.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using TD3, SAC, Open AI Gym, MuJoCo, and rl-plotter, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For AEQ-TD3, we use N = 2 and N = 10 critics with three hidden layers, βb = 0.5, βs = 0.5 for every tasks, and UTD = 1 for fair comparison. For AEQ-SAC, we use N = 10 and UTD = 20 to compare with REDQ [Chen et al., 2021].