Addressing Signal Delay in Deep Reinforcement Learning
Authors: Wei Wang, Dongqi Han, Xufang Luo, Dongsheng Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our methods achieve remarkable performance in continuous robotic control tasks with large delays, yielding results comparable to those in non-delayed cases.5 EXPERIMENTAL RESULTS |
| Researcher Affiliation | Collaboration | Wei Wang1 Dongqi Han2 Xufang Luo2 Dongsheng Li2 1Western University, Canada 2Microsoft Research Asia waybaba2ww@gmail.com, {dongqihan, xufluo, dongsli}@microsoft.com |
| Pseudocode | No | The paper describes algorithms and methods using text and diagrams, but it does not include any explicitly labeled |
| Open Source Code | No | The paper states |
| Open Datasets | Yes | We performed our experimental evaluations across Mu Jo Co environments (Todorov et al., 2012) with signal delay. We developed an accessible plug-in environment wrapper for delayed environments, utilizing the gym.Wrapper from the Open AI Gymnasium library (Brockman et al., 2016). |
| Dataset Splits | No | The paper conducts evaluations after a certain number of environmental steps (e.g., |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models, memory, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions using `gym.Wrapper` from the `Open AI Gymnasium library` and compatibility with DRL algorithms like DDPG, TD3, SAC, but it does not specify version numbers for any software dependencies, libraries, or programming languages used. |
| Experiment Setup | Yes | For a fair and consistent evaluation, we adhere to default hyperparameters as outlined in foundational studies for each algorithm. In the case of SAC, we follow parameters from Haarnoja et al. (2018b). Our approach for technique-specific parameters begins with standard settings, followed by adjustments within a practical range. This includes tuning the prediction loss weight in methods involving prediction and encoding, testing values in the set {0.005, 0.01, 0.05, 0.1}. Additionally, we explore auto KL weight tuning, setting target KL loss ranges, and sweeping through values {5, 20, 50, 200}. For input format consistency, especially in RNN-based models, we use 32 sequences of length 64. The memory buffer size is matched to the number of environment steps. |