Large Language Models Are Semi-Parametric Reinforcement Learning Agents
Authors: Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on two RL task sets to evaluate the proposed framework. The average results with different initialization and training sets exceed the prior SOTA by 4% and 2% for the success rate on two task sets and demonstrate the superiority and robustness of REMEMBERER. |
| Researcher Affiliation | Academia | Danyang Zhang1 Lu Chen1,2 Situo Zhang1 Hongshen Xu1 Zihan Zhao1 Kai Yu1,2 1X-LANCE Lab, Department of Computer Science and Engineering Mo E Key Lab of Artificial Intelligence, SJTU AI Institute Shanghai Jiao Tong University, Shanghai, China 2Suzhou Laboratory, Suzhou, China {zhang-dy20,chenlusz,situozhang,xuhongshen,zhao_mengxin,kai.yu}@sjtu.edu.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. It uses mathematical equations and diagrams to describe processes. |
| Open Source Code | Yes | The codes are open-sourced at https://github.com/Open DFM/Rememberer. |
| Open Datasets | Yes | To assess the effectiveness of REMEMBERER, we evaluate it on two recent task sets with the promising performance of LLM-based agents: Web Shop and Wiki How. ... Web Shop [Yao et al., 2022a] ... Wiki How [Zhang et al., 2023] |
| Dataset Splits | No | The paper explicitly mentions training and test sets but does not specify a validation set or its corresponding split percentages or counts. For example, it states: 'The agent is trained on a few tasks and tested on some other tasks to check whether the experiences from different tasks can help the agent in the decision of the unseen episodes.' |
| Hardware Specification | No | The paper states: 'All the experiments are conducted based on the Open AI API of GPT-3.5 [Brown et al., 2020] text-davinci-003'. This indicates the use of OpenAI's API, but it does not specify the underlying hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI API of GPT-3.5 [Brown et al., 2020] text-davinci-003' and 'all-Mini LM-L12-v2 model from Sentence-Transformers [Reimers and Gurevych, 2019]'. While these are software components, specific version numbers for general software dependencies (like Python, PyTorch, or other libraries) are not provided. |
| Experiment Setup | Yes | The agent is trained for 3 epochs on a training set containing 10 different tasks... The learning rate, α, is 1/N where N denotes the times this value is updated. ... n-step bootstrapping [Mnih et al., 2016] is adopted to ameliorate this problem... REMEMBERER is applied to Web Shop with 2-shot in-context learning. ... The m records with the highest similarities are retrieved to form the exemplars in the prompt. |