Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning
Authors: Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks. |
| Researcher Affiliation | Academia | 1Mohamed bin Zayed University of Artificial Intelligence, UAE 2School of Artificial Intelligence, Jilin University, China |
| Pseudocode | Yes | Algorithm 1 NES with Hard-Thresholding |
| Open Source Code | Yes | Our code is available at https://github.com/cangcn/NES-HT. |
| Open Datasets | Yes | We perform evaluations on two popular RL protocols, Mujoco [Todorov et al., 2012] and Atari [Bellemare et al., 2013] environments. |
| Dataset Splits | No | The paper mentions using standard Mujoco and Atari environments and describes training configurations (e.g., interaction steps, duration) but does not provide explicit numerical train/validation/test dataset splits or detailed splitting methodology. |
| Hardware Specification | No | The paper mentions training on a '500-core machine' for Atari experiments, but does not provide specific details such as CPU/GPU models, memory, or other detailed computer specifications. |
| Software Dependencies | No | The paper mentions using Mujoco and Atari environments and various RL algorithms, but it does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | To simulate decision-making in the presence of task-irrelevant features, we concatenate Gaussian noise with the environment-provided observations. Additionally, we set 90% of the immediate rewards to zero... Specifically, we train the policy for a duration of 1 hour using a 500-core machine. Furthermore, we set an upper limit on the interaction budget at 10M steps. We report the average scores received by last 10 evaluations across 20 random seeds. |