Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning

Authors: Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks.
Researcher Affiliation Academia 1Mohamed bin Zayed University of Artificial Intelligence, UAE 2School of Artificial Intelligence, Jilin University, China
Pseudocode Yes Algorithm 1 NES with Hard-Thresholding
Open Source Code Yes Our code is available at https://github.com/cangcn/NES-HT.
Open Datasets Yes We perform evaluations on two popular RL protocols, Mujoco [Todorov et al., 2012] and Atari [Bellemare et al., 2013] environments.
Dataset Splits No The paper mentions using standard Mujoco and Atari environments and describes training configurations (e.g., interaction steps, duration) but does not provide explicit numerical train/validation/test dataset splits or detailed splitting methodology.
Hardware Specification No The paper mentions training on a '500-core machine' for Atari experiments, but does not provide specific details such as CPU/GPU models, memory, or other detailed computer specifications.
Software Dependencies No The paper mentions using Mujoco and Atari environments and various RL algorithms, but it does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes To simulate decision-making in the presence of task-irrelevant features, we concatenate Gaussian noise with the environment-provided observations. Additionally, we set 90% of the immediate rewards to zero... Specifically, we train the policy for a duration of 1 hour using a 500-core machine. Furthermore, we set an upper limit on the interaction budget at 10M steps. We report the average scores received by last 10 evaluations across 20 random seeds.