LLM-Empowered State Representation for Reinforcement Learning
Authors: Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate LESR exhibits high sample efficiency and outperforms state-of-the-art baselines by an average of 29% in accumulated reward in Mujoco tasks and 30% in success rates in Gym-Robotics tasks. |
| Researcher Affiliation | Academia | 1Tsinghua University. |
| Pseudocode | Yes | Algorithm 1 LLM-Empowered State Representation |
| Open Source Code | Yes | Codes of LESR are accessible at https: //github.com/thu-rllab/LESR. |
| Open Datasets | Yes | In this section, we will assess LLM-Empowered State Representation (LESR) through experiments on two wellestablished reinforcement learning (RL) benchmarks: Mujoco (Todorov et al., 2012; Brockman et al., 2016) and Gym Robotics (de Lazcano et al., 2023). |
| Dataset Splits | No | The paper mentions 'Nsmall training timesteps' and 'total final evaluation timesteps N' but does not specify a distinct validation set or split for hyperparameter tuning or model selection. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'python', 'Mujoco', and 'Gym-Robotics' but does not specify version numbers for these or any other libraries or dependencies required to reproduce the experiments. |
| Experiment Setup | Yes | We employ the gpt-4-1106-preview as LLM to generate the state representation and intrinsic reward functions. There are three well-designed prompt templates and details of prompts are available in Appendix C. We employ the SOTA RL algorithm TD3 (Fujimoto et al., 2018) as the foundational Deep Reinforcement Learning (DRL) algorithm. For a comprehensive list of hyperparameters, please refer to Appendix G. |