Addressing Signal Delay in Deep Reinforcement Learning

Authors: Wei Wang, Dongqi Han, Xufang Luo, Dongsheng Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our methods achieve remarkable performance in continuous robotic control tasks with large delays, yielding results comparable to those in non-delayed cases.5 EXPERIMENTAL RESULTS
Researcher Affiliation Collaboration Wei Wang1 Dongqi Han2 Xufang Luo2 Dongsheng Li2 1Western University, Canada 2Microsoft Research Asia waybaba2ww@gmail.com, {dongqihan, xufluo, dongsli}@microsoft.com
Pseudocode No The paper describes algorithms and methods using text and diagrams, but it does not include any explicitly labeled
Open Source Code No The paper states
Open Datasets Yes We performed our experimental evaluations across Mu Jo Co environments (Todorov et al., 2012) with signal delay. We developed an accessible plug-in environment wrapper for delayed environments, utilizing the gym.Wrapper from the Open AI Gymnasium library (Brockman et al., 2016).
Dataset Splits No The paper conducts evaluations after a certain number of environmental steps (e.g.,
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models, memory, or cloud instance specifications.
Software Dependencies No The paper mentions using `gym.Wrapper` from the `Open AI Gymnasium library` and compatibility with DRL algorithms like DDPG, TD3, SAC, but it does not specify version numbers for any software dependencies, libraries, or programming languages used.
Experiment Setup Yes For a fair and consistent evaluation, we adhere to default hyperparameters as outlined in foundational studies for each algorithm. In the case of SAC, we follow parameters from Haarnoja et al. (2018b). Our approach for technique-specific parameters begins with standard settings, followed by adjustments within a practical range. This includes tuning the prediction loss weight in methods involving prediction and encoding, testing values in the set {0.005, 0.01, 0.05, 0.1}. Additionally, we explore auto KL weight tuning, setting target KL loss ranges, and sweeping through values {5, 20, 50, 200}. For input format consistency, especially in RNN-based models, we use 32 sequences of length 64. The memory buffer size is matched to the number of environment steps.