reproducibilityindex.ai

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Authors: Takuya Hiraoka, Takahisa Imagawa, Taisei Hashimoto, Takashi Onishi, Yoshimasa Tsuruoka

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Despite its simplicity of implementation, our experimental results indicate that Dro Q is doubly (sample and computationally) efﬁcient. It achieved comparable sample efﬁciency with REDQ, much better computational efﬁciency than REDQ, and comparable computational efﬁciency with that of SAC.
Researcher Affiliation	Collaboration	Takuya Hiraoka 1,2, Takahisa Imagawa 2, Taisei Hashimoto 2,3, Takashi Onishi 1,2, Yoshimasa Tsuruoka 2,3 1 NEC Corporation 2 National Institute of Advanced Industrial Science and Technology 3 The University of Tokyo
Pseudocode	Yes	Algorithm 1 REDQ" and "Algorithm 2 Dro Q
Open Source Code	Yes	Our source code is available at https://github.com/Takuya Hiraoka/ Dropout-Q-Functions-for-Doubly-Efficient-Reinforcement-Learning
Open Datasets	Yes	To evaluate the performances of Dro Q, we compared Dro Q with three baseline methods in Mu Jo Co benchmark environments (Todorov et al., 2012; Brockman et al., 2016). Following Chen et al. (2021b); Janner et al. (2019), we prepared the following environments: Hopper, Walker2d, Ant, and Humanoid.
Dataset Splits	No	The paper describes running 'ten test episodes with the current policy and recorded the average return' after every epoch, which serves as an evaluation during training. However, it does not specify explicit train/validation/test dataset splits in the traditional supervised learning sense, as data is generated through environment interactions.
Hardware Specification	Yes	For evaluation, we ran each method on a machine equipped with two Intel(R) Xeon(R) CPU E5-2667 v4 and one NVIDIA Tesla K80.
Software Dependencies	No	The paper mentions using 'Adam' as an optimizer and references the PyTorch profiler, but it does not specify version numbers for key software components like Python, PyTorch, or CUDA that would be needed for replication.
Experiment Setup	Yes	The hyperparameter settings for each method in the experiments discussed in Section 4 are listed in Table 8. Parameter values, except for (i) dropout rate for Dro Q and DUVN and (ii) M for DUVN, were set according to Chen et al. (2021b). The dropout rate (i) was set through line search, and M for DUVN (ii) was set according to Harrigan (2016); Moerland et al. (2017)." and also "Table 8: Hyperparameter settings Method Parameter Value SAC, REDQ, Dro Q, and DUVN optimizer Adam (Kingma & Ba, 2015) learning rate 3e-2 discount rate (γ) 0.99 target-smoothing coefﬁcient (ρ) 0.005 replay buffer size 10^6 number of hidden layers for all networks 2 number of hidden units per layer 256 mini-batch size 256 random starting data 5000 UTD ratio G 20 REDQ and Dro Q in-target minimization parameter M 2 REDQ ensemble size N 10 Dro Q and DUVN dropout rate 0.01 DUVN in-target minimization parameter M 1