Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hard-Thresholding Meets Evolution Strategies in Reinforcement Learning
Authors: Chengqian Gao, William de Vazelhes, Hualin Zhang, Bin Gu, Zhiqiang Xu
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks. |
| Researcher Affiliation | Academia | 1Mohamed bin Zayed University of Artificial Intelligence, UAE 2School of Artificial Intelligence, Jilin University, China |
| Pseudocode | Yes | Algorithm 1 NES with Hard-Thresholding |
| Open Source Code | Yes | Our code is available at https://github.com/cangcn/NES-HT. |
| Open Datasets | Yes | We perform evaluations on two popular RL protocols, Mujoco [Todorov et al., 2012] and Atari [Bellemare et al., 2013] environments. |
| Dataset Splits | No | The paper mentions using standard Mujoco and Atari environments and describes training configurations (e.g., interaction steps, duration) but does not provide explicit numerical train/validation/test dataset splits or detailed splitting methodology. |
| Hardware Specification | No | The paper mentions training on a '500-core machine' for Atari experiments, but does not provide specific details such as CPU/GPU models, memory, or other detailed computer specifications. |
| Software Dependencies | No | The paper mentions using Mujoco and Atari environments and various RL algorithms, but it does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | To simulate decision-making in the presence of task-irrelevant features, we concatenate Gaussian noise with the environment-provided observations. Additionally, we set 90% of the immediate rewards to zero... Specifically, we train the policy for a duration of 1 hour using a 500-core machine. Furthermore, we set an upper limit on the interaction budget at 10M steps. We report the average scores received by last 10 evaluations across 20 random seeds. |