Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Adaptive Estimation Q-learning with Uncertainty and Familiarity
Authors: Xiaoyu Gong, Shuai Lü, Jiayu Yu, Sheng Zhu, Zongze Li
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate AEQ on several continuous control tasks, outperforming stateof-the-art performance. |
| Researcher Affiliation | Academia | 1Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, China 2College of Computer Science and Technology, Jilin University, China 3College of Software, Jilin University, China EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 AEQ-TD3 |
| Open Source Code | Yes | To ensure that our results are convincing and reproducible, we will open-source the code. Implementations and appendix are available at: https://github.com/gxywy/AEQ |
| Open Datasets | Yes | We evaluate our method on a range of Mu Jo Co [Todorov et al., 2012] continuous control tasks from Open AI Gym [Brockman et al., 2016]. |
| Dataset Splits | No | The paper uses continuous control tasks from Open AI Gym and MuJoCo, which are interactive environments. It specifies total timesteps for training (e.g., "2 x 10^6 time steps") and number of random seeds ("5 random seeds"), but not explicit train/validation/test dataset splits in the traditional sense of partitioning a fixed dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using TD3, SAC, Open AI Gym, MuJoCo, and rl-plotter, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For AEQ-TD3, we use N = 2 and N = 10 critics with three hidden layers, βb = 0.5, βs = 0.5 for every tasks, and UTD = 1 for fair comparison. For AEQ-SAC, we use N = 10 and UTD = 20 to compare with REDQ [Chen et al., 2021]. |