Large Language Models Are Semi-Parametric Reinforcement Learning Agents

Authors: Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on two RL task sets to evaluate the proposed framework. The average results with different initialization and training sets exceed the prior SOTA by 4% and 2% for the success rate on two task sets and demonstrate the superiority and robustness of REMEMBERER.
Researcher Affiliation Academia Danyang Zhang1 Lu Chen1,2 Situo Zhang1 Hongshen Xu1 Zihan Zhao1 Kai Yu1,2 1X-LANCE Lab, Department of Computer Science and Engineering Mo E Key Lab of Artificial Intelligence, SJTU AI Institute Shanghai Jiao Tong University, Shanghai, China 2Suzhou Laboratory, Suzhou, China {zhang-dy20,chenlusz,situozhang,xuhongshen,zhao_mengxin,kai.yu}@sjtu.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It uses mathematical equations and diagrams to describe processes.
Open Source Code Yes The codes are open-sourced at https://github.com/Open DFM/Rememberer.
Open Datasets Yes To assess the effectiveness of REMEMBERER, we evaluate it on two recent task sets with the promising performance of LLM-based agents: Web Shop and Wiki How. ... Web Shop [Yao et al., 2022a] ... Wiki How [Zhang et al., 2023]
Dataset Splits No The paper explicitly mentions training and test sets but does not specify a validation set or its corresponding split percentages or counts. For example, it states: 'The agent is trained on a few tasks and tested on some other tasks to check whether the experiences from different tasks can help the agent in the decision of the unseen episodes.'
Hardware Specification No The paper states: 'All the experiments are conducted based on the Open AI API of GPT-3.5 [Brown et al., 2020] text-davinci-003'. This indicates the use of OpenAI's API, but it does not specify the underlying hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions 'Open AI API of GPT-3.5 [Brown et al., 2020] text-davinci-003' and 'all-Mini LM-L12-v2 model from Sentence-Transformers [Reimers and Gurevych, 2019]'. While these are software components, specific version numbers for general software dependencies (like Python, PyTorch, or other libraries) are not provided.
Experiment Setup Yes The agent is trained for 3 epochs on a training set containing 10 different tasks... The learning rate, α, is 1/N where N denotes the times this value is updated. ... n-step bootstrapping [Mnih et al., 2016] is adopted to ameliorate this problem... REMEMBERER is applied to Web Shop with 2-shot in-context learning. ... The m records with the highest similarities are retrieved to form the exemplars in the prompt.