reproducibilityindex.ai

Addressing Signal Delay in Deep Reinforcement Learning

Authors: Wei Wang, Dongqi Han, Xufang Luo, Dongsheng Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our methods achieve remarkable performance in continuous robotic control tasks with large delays, yielding results comparable to those in non-delayed cases.5 EXPERIMENTAL RESULTS
Researcher Affiliation	Collaboration	Wei Wang1 Dongqi Han2 Xufang Luo2 Dongsheng Li2 1Western University, Canada 2Microsoft Research Asia waybaba2ww@gmail.com, {dongqihan, xufluo, dongsli}@microsoft.com
Pseudocode	No	The paper describes algorithms and methods using text and diagrams, but it does not include any explicitly labeled
Open Source Code	No	The paper states
Open Datasets	Yes	We performed our experimental evaluations across Mu Jo Co environments (Todorov et al., 2012) with signal delay. We developed an accessible plug-in environment wrapper for delayed environments, utilizing the gym.Wrapper from the Open AI Gymnasium library (Brockman et al., 2016).
Dataset Splits	No	The paper conducts evaluations after a certain number of environmental steps (e.g.,
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models, memory, or cloud instance specifications.
Software Dependencies	No	The paper mentions using `gym.Wrapper` from the `Open AI Gymnasium library` and compatibility with DRL algorithms like DDPG, TD3, SAC, but it does not specify version numbers for any software dependencies, libraries, or programming languages used.
Experiment Setup	Yes	For a fair and consistent evaluation, we adhere to default hyperparameters as outlined in foundational studies for each algorithm. In the case of SAC, we follow parameters from Haarnoja et al. (2018b). Our approach for technique-specific parameters begins with standard settings, followed by adjustments within a practical range. This includes tuning the prediction loss weight in methods involving prediction and encoding, testing values in the set {0.005, 0.01, 0.05, 0.1}. Additionally, we explore auto KL weight tuning, setting target KL loss ranges, and sweeping through values {5, 20, 50, 200}. For input format consistency, especially in RNN-based models, we use 32 sequences of length 64. The memory buffer size is matched to the number of environment steps.