reproducibilityindex.ai

Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning

Authors: Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks.
Researcher Affiliation	Academia	1Mohamed bin Zayed University of Artificial Intelligence, UAE 2School of Artificial Intelligence, Jilin University, China
Pseudocode	Yes	Algorithm 1 NES with Hard-Thresholding
Open Source Code	Yes	Our code is available at https://github.com/cangcn/NES-HT.
Open Datasets	Yes	We perform evaluations on two popular RL protocols, Mujoco [Todorov et al., 2012] and Atari [Bellemare et al., 2013] environments.
Dataset Splits	No	The paper mentions using standard Mujoco and Atari environments and describes training configurations (e.g., interaction steps, duration) but does not provide explicit numerical train/validation/test dataset splits or detailed splitting methodology.
Hardware Specification	No	The paper mentions training on a '500-core machine' for Atari experiments, but does not provide specific details such as CPU/GPU models, memory, or other detailed computer specifications.
Software Dependencies	No	The paper mentions using Mujoco and Atari environments and various RL algorithms, but it does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup	Yes	To simulate decision-making in the presence of task-irrelevant features, we concatenate Gaussian noise with the environment-provided observations. Additionally, we set 90% of the immediate rewards to zero... Specifically, we train the policy for a duration of 1 hour using a 500-core machine. Furthermore, we set an upper limit on the interaction budget at 10M steps. We report the average scores received by last 10 evaluations across 20 random seeds.