Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LaRes: Evolutionary Reinforcement Learning with LLM-based Adaptive Reward Search
Authors: Pengyi Li, Hongyao Tang, Jinbin Qiao, YAN ZHENG, Jianye Hao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across both initialized and non-initialized settings, La Res consistently achieves state-of-the-art performance, outperforming strong baselines in both sample efficiency and final performance. The code is available at https://github.com/yeshenpy/La Res. |
| Researcher Affiliation | Academia | Pengyi Li College of Intelligence and Computing Tianjin University EMAIL |
| Pseudocode | Yes | Algorithm 1 La Res Framework |
| Open Source Code | Yes | The code is available at https://github.com/yeshenpy/La Res. |
| Open Datasets | Yes | We evaluate La Res on a wide range of benchmarks, including manipulation tasks from the Meta World and Mani Skill3 suites [68, 69], Min Atar tasks with image inputs [70], and locomotion tasks from Mu Jo Co [71]. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions training for a certain number of environment steps and running 5 independent runs for statistics, but not how the data within these environments is split for training and evaluation in a reproducible manner, or specific test environments used. |
| Hardware Specification | Yes | The Meta World experiments are carried out on Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz. Mani Skill3 leverages GPU acceleration; therefore, we conduct experiments on NVIDIA GTX 2080 Ti GPU with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz. |
| Software Dependencies | No | The paper mentions software components like Python's multiprocessing library and relies on existing implementations (e.g., SAC implementation from Evo Rainbow, official SAC implementation by Mani Skill3, official DQN implementation by VEB-RL), but it does not specify explicit version numbers for these software components or underlying frameworks (e.g., Python version, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | For La Res, we set the population size to 5. We perform 5 iterations of the reward population evolution. All implementation details are provided in Appendix B. ... In all experiments, we maintain a reward function population of size 5, along with the original human-designed reward function, resulting in a total of 6 reward functions. ... The elite size is set to 3 for all tasks. Thompson sampling parameters α and β are set to 1 by default. ... For all tasks in Meta World, we trained for 1 million environment steps, while for all tasks in Mani Skill3, we trained for 2 million environment steps. Consequently, for tasks in Meta World, population evolution is performed every 200,000 environment steps, whereas for tasks in Mani Skill3, it is performed every 500,000 steps. ... Specifically, we set the UTD ratio to 1 for Meta World, and to 0.5 for Mani Skill3. |