reproducibilityindex.ai

LLM-Empowered State Representation for Reinforcement Learning

Authors: Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate LESR exhibits high sample efficiency and outperforms state-of-the-art baselines by an average of 29% in accumulated reward in Mujoco tasks and 30% in success rates in Gym-Robotics tasks.
Researcher Affiliation	Academia	1Tsinghua University.
Pseudocode	Yes	Algorithm 1 LLM-Empowered State Representation
Open Source Code	Yes	Codes of LESR are accessible at https: //github.com/thu-rllab/LESR.
Open Datasets	Yes	In this section, we will assess LLM-Empowered State Representation (LESR) through experiments on two wellestablished reinforcement learning (RL) benchmarks: Mujoco (Todorov et al., 2012; Brockman et al., 2016) and Gym Robotics (de Lazcano et al., 2023).
Dataset Splits	No	The paper mentions 'Nsmall training timesteps' and 'total final evaluation timesteps N' but does not specify a distinct validation set or split for hyperparameter tuning or model selection.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software like 'python', 'Mujoco', and 'Gym-Robotics' but does not specify version numbers for these or any other libraries or dependencies required to reproduce the experiments.
Experiment Setup	Yes	We employ the gpt-4-1106-preview as LLM to generate the state representation and intrinsic reward functions. There are three well-designed prompt templates and details of prompts are available in Appendix C. We employ the SOTA RL algorithm TD3 (Fujimoto et al., 2018) as the foundational Deep Reinforcement Learning (DRL) algorithm. For a comprehensive list of hyperparameters, please refer to Appendix G.