reproducibilityindex.ai

Adaptive Estimation Q-learning with Uncertainty and Familiarity

Authors: Xiaoyu Gong, Shuai Lü, Jiayu Yu, Sheng Zhu, Zongze Li

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate AEQ on several continuous control tasks, outperforming stateof-the-art performance.
Researcher Affiliation	Academia	1Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, China 2College of Computer Science and Technology, Jilin University, China 3College of Software, Jilin University, China lus@jlu.edu.cn, {gongxy20, yujy19, zhusheng20, zzli20}@mails.jlu.edu.cn
Pseudocode	Yes	Algorithm 1 AEQ-TD3
Open Source Code	Yes	To ensure that our results are convincing and reproducible, we will open-source the code. Implementations and appendix are available at: https://github.com/gxywy/AEQ
Open Datasets	Yes	We evaluate our method on a range of Mu Jo Co [Todorov et al., 2012] continuous control tasks from Open AI Gym [Brockman et al., 2016].
Dataset Splits	No	The paper uses continuous control tasks from Open AI Gym and MuJoCo, which are interactive environments. It specifies total timesteps for training (e.g., "2 x 10^6 time steps") and number of random seeds ("5 random seeds"), but not explicit train/validation/test dataset splits in the traditional sense of partitioning a fixed dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using TD3, SAC, Open AI Gym, MuJoCo, and rl-plotter, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For AEQ-TD3, we use N = 2 and N = 10 critics with three hidden layers, βb = 0.5, βs = 0.5 for every tasks, and UTD = 1 for fair comparison. For AEQ-SAC, we use N = 10 and UTD = 20 to compare with REDQ [Chen et al., 2021].