Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LaRes: Evolutionary Reinforcement Learning with LLM-based Adaptive Reward Search

Authors: Pengyi Li, Hongyao Tang, Jinbin Qiao, YAN ZHENG, Jianye Hao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across both initialized and non-initialized settings, La Res consistently achieves state-of-the-art performance, outperforming strong baselines in both sample efficiency and final performance. The code is available at https://github.com/yeshenpy/La Res.
Researcher Affiliation Academia Pengyi Li College of Intelligence and Computing Tianjin University EMAIL
Pseudocode Yes Algorithm 1 La Res Framework
Open Source Code Yes The code is available at https://github.com/yeshenpy/La Res.
Open Datasets Yes We evaluate La Res on a wide range of benchmarks, including manipulation tasks from the Meta World and Mani Skill3 suites [68, 69], Min Atar tasks with image inputs [70], and locomotion tasks from Mu Jo Co [71].
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It mentions training for a certain number of environment steps and running 5 independent runs for statistics, but not how the data within these environments is split for training and evaluation in a reproducible manner, or specific test environments used.
Hardware Specification Yes The Meta World experiments are carried out on Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz. Mani Skill3 leverages GPU acceleration; therefore, we conduct experiments on NVIDIA GTX 2080 Ti GPU with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz.
Software Dependencies No The paper mentions software components like Python's multiprocessing library and relies on existing implementations (e.g., SAC implementation from Evo Rainbow, official SAC implementation by Mani Skill3, official DQN implementation by VEB-RL), but it does not specify explicit version numbers for these software components or underlying frameworks (e.g., Python version, PyTorch/TensorFlow versions).
Experiment Setup Yes For La Res, we set the population size to 5. We perform 5 iterations of the reward population evolution. All implementation details are provided in Appendix B. ... In all experiments, we maintain a reward function population of size 5, along with the original human-designed reward function, resulting in a total of 6 reward functions. ... The elite size is set to 3 for all tasks. Thompson sampling parameters α and β are set to 1 by default. ... For all tasks in Meta World, we trained for 1 million environment steps, while for all tasks in Mani Skill3, we trained for 2 million environment steps. Consequently, for tasks in Meta World, population evolution is performed every 200,000 environment steps, whereas for tasks in Mani Skill3, it is performed every 500,000 steps. ... Specifically, we set the UTD ratio to 1 for Meta World, and to 0.5 for Mani Skill3.