LLM-Empowered State Representation for Reinforcement Learning

Authors: Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate LESR exhibits high sample efficiency and outperforms state-of-the-art baselines by an average of 29% in accumulated reward in Mujoco tasks and 30% in success rates in Gym-Robotics tasks.
Researcher Affiliation Academia 1Tsinghua University.
Pseudocode Yes Algorithm 1 LLM-Empowered State Representation
Open Source Code Yes Codes of LESR are accessible at https: //github.com/thu-rllab/LESR.
Open Datasets Yes In this section, we will assess LLM-Empowered State Representation (LESR) through experiments on two wellestablished reinforcement learning (RL) benchmarks: Mujoco (Todorov et al., 2012; Brockman et al., 2016) and Gym Robotics (de Lazcano et al., 2023).
Dataset Splits No The paper mentions 'Nsmall training timesteps' and 'total final evaluation timesteps N' but does not specify a distinct validation set or split for hyperparameter tuning or model selection.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions software like 'python', 'Mujoco', and 'Gym-Robotics' but does not specify version numbers for these or any other libraries or dependencies required to reproduce the experiments.
Experiment Setup Yes We employ the gpt-4-1106-preview as LLM to generate the state representation and intrinsic reward functions. There are three well-designed prompt templates and details of prompts are available in Appendix C. We employ the SOTA RL algorithm TD3 (Fujimoto et al., 2018) as the foundational Deep Reinforcement Learning (DRL) algorithm. For a comprehensive list of hyperparameters, please refer to Appendix G.