SPRING: Studying Papers and Reasoning to play Games
Authors: Yue Wu, So Yeon Min, Shrimai Prabhumoye, Yonatan Bisk, Russ R. Salakhutdinov, Amos Azaria, Tom M. Mitchell, Yuanzhi Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we study the quality of in-context reasoning induced by different forms of prompts under the setting of the Crafter environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, 2NVIDIA, 3Ariel University, 4Microsoft Research |
| Pseudocode | No | The paper describes the framework and processes using figures and textual descriptions (e.g., 'Figure 1: Overview of SPRING', 'Figure 2: Paper Studying Moudle', 'Figure 3: Reasoning'), but it does not contain a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code at github.com/holmeswww/SPRING |
| Open Datasets | Yes | The Crafter environment (Hafner, 2021) is a procedurally generated open-world survival game for benchmarking RL algorithms with 22 achievements in a tech tree of depth 7. |
| Dataset Splits | No | The paper uses a zero-shot LLM-based approach in a game environment, which does not involve traditional dataset splits for training and validation. Comparisons are made against RL baselines trained for a specific number of steps (e.g., '1M steps'), which refers to training duration in an environment, not a dataset split. |
| Hardware Specification | No | The paper states that it uses 'GPT-3.5-turbo (Open AI, Open AI) and GPT-4 (Open AI, 2023) from Open AI’s API', but does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running their local framework or interacting with these APIs. |
| Software Dependencies | No | The paper mentions 'GPT-3.5-turbo' and 'GPT-4' (LLM models) and 'cv2.filters' (a library function), but it does not provide specific version numbers for any software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | Prompted with the LATEX source as game context and a description of the agent’s current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM’s answer to final node directly translating to environment actions. |