reproducibilityindex.ai

SPRING: Studying Papers and Reasoning to play Games

Authors: Yue Wu, So Yeon Min, Shrimai Prabhumoye, Yonatan Bisk, Russ R. Salakhutdinov, Amos Azaria, Tom M. Mitchell, Yuanzhi Li

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we study the quality of in-context reasoning induced by different forms of prompts under the setting of the Crafter environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories. Quantitatively, SPRING with GPT-4 outperforms all state-of-the-art RL baselines, trained for 1M steps, without any training.
Researcher Affiliation	Collaboration	1Carnegie Mellon University, 2NVIDIA, 3Ariel University, 4Microsoft Research
Pseudocode	No	The paper describes the framework and processes using figures and textual descriptions (e.g., 'Figure 1: Overview of SPRING', 'Figure 2: Paper Studying Moudle', 'Figure 3: Reasoning'), but it does not contain a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code at github.com/holmeswww/SPRING
Open Datasets	Yes	The Crafter environment (Hafner, 2021) is a procedurally generated open-world survival game for benchmarking RL algorithms with 22 achievements in a tech tree of depth 7.
Dataset Splits	No	The paper uses a zero-shot LLM-based approach in a game environment, which does not involve traditional dataset splits for training and validation. Comparisons are made against RL baselines trained for a specific number of steps (e.g., '1M steps'), which refers to training duration in an environment, not a dataset split.
Hardware Specification	No	The paper states that it uses 'GPT-3.5-turbo (Open AI, Open AI) and GPT-4 (Open AI, 2023) from Open AI’s API', but does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running their local framework or interacting with these APIs.
Software Dependencies	No	The paper mentions 'GPT-3.5-turbo' and 'GPT-4' (LLM models) and 'cv2.filters' (a library function), but it does not provide specific version numbers for any software dependencies required to replicate the experiments.
Experiment Setup	Yes	Prompted with the LATEX source as game context and a description of the agent’s current observation, our SPRING framework employs a directed acyclic graph (DAG) with game-related questions as nodes and dependencies as edges. We identify the optimal action to take in the environment by traversing the DAG and calculating LLM responses for each node in topological order, with the LLM’s answer to ﬁnal node directly translating to environment actions.