reproducibilityindex.ai

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

Authors: Hang Wang, Sen Lin, Junshan Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are carried out to show that Ada EQ can improve the learning performance than the existing methods for the Mu Jo Co benchmark.
Researcher Affiliation	Academia	Hang Wang Arizona State University Tempe, Arizona, USA hwang442@asu.edu Sen Lin Arizona State University Tempe, Arizona, USA slin70@asu.edu Junshan Zhang Arizona State University Tempe, Arizona, USA Junshan.Zhang@asu.edu
Pseudocode	Yes	Algorithm 1 Adaptive Ensemble Q-learning (Ada EQ)
Open Source Code	Yes	Our training code and training logs will be available at https://github.com/ ustcmike/Ada EQ_Neur IPS21
Open Datasets	Yes	To make a fair comparison, we follow the setup of [6] and use the same code base to compare the performance of Ada EQ with REDQ [6] and Average-DQN (AVG) [2], on three Mu Jo Co continuous control tasks: Hopper, Ant and Walker2d. ... Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026 5033. IEEE, 2012.
Dataset Splits	No	The paper describes running "evaluation episodes" and "testing trajectories" for performance assessment, but it does not specify traditional training/validation/test dataset splits with percentages or counts, which is common in static dataset-based machine learning.
Hardware Specification	Yes	We conduct all experiments using an NVIDIA GeForce RTX 3090 GPU and an Intel(R) Core(TM) i9-10900K CPU.
Software Dependencies	Yes	We use PyTorch (version 1.8.1) for implementing the deep neural networks.
Experiment Setup	Yes	The same hyperparameters are used for all the algorithms. Speciﬁcally, we consider N = 10 Q-function approximators in total. The ensemble size M = N = 10 for AVG, while the initial M for Ada EQ is set as 4. The ensemble size for REDQ is set as M = 2, which is the ﬁne-tuned result from [6]. For all the experiments, we set the tolerance parameter c in (10) as 0.3 and the length of the testing trajectories as H = 500. The ensemble size is updated according to (10) every 10 epochs in Ada EQ. The discount factor is 0.99. ... The actor and critic networks use 2 hidden layers with 256 units and ReLU activations. We use the Adam optimizer with a learning rate of 3e-4. The replay buffer size is 1e6 and the batch size is 256. The policy is updated every 20 gradient steps. The target networks are updated with polyak averaging with a parameter of 0.995. The exploration noise is decayed from 1 to 0.1 over 100000 steps.