reproducibilityindex.ai

Bootstrapping Cognitive Agents with a Large Language Model

Authors: Feiyu Zhu, Reid Simmons

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments also indicate that the cognitive agent bootstrapped using this framework can generalize to novel environments and be scaled to complex tasks. Experiments Setup Following previous works in the embodied agents domain (Sarch et al. 2022; Trabucco et al. 2023), we evaluate our method in kitchen environments (see Figure 2) in the AI2THOR simulator (Kolve et al. 2017). Results Table 1 shows the quantitative results of different types of agents performing each kitchen task.
Researcher Affiliation	Academia	Feiyu Zhu, Reid Simmons Carnegie Mellon University feiyuz@andrew.cmu.edu, rsimmons@andrew.cmu.edu
Pseudocode	No	The paper includes 'Listing 1: Production interface' which shows a Python class structure, but this is an interface snippet for generated code rather than a pseudocode algorithm block representing the methodology.
Open Source Code	Yes	1Code at github.com/zfy0314/cognitive-agents
Open Datasets	Yes	we evaluate our method in kitchen environments (see Figure 2) in the AI2THOR simulator (Kolve et al. 2017). For find and slice, 5 target objects are chosen for each task, and we run 3 trials for each object where the initial locations of the objects are shuffled.
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits. It describes using a 'training floor plan' and a 'testing floor plan' and running trials, but does not specify quantitative splits for a static dataset.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper mentions using 'GPT4-0613 (Open AI 2023)' for LLM queries, but it does not specify software dependencies (e.g., programming language versions, library versions like PyTorch, TensorFlow, or specific solvers) for their implementation.
Experiment Setup	Yes	We use GPT4-0613 (Open AI 2023) for our experiments as previous works have shown that GPT3.5 is insufficient for code generation (Olausson et al. 2023; Wang et al. 2023). We set temperature to the 0 for the most deterministic response. U(P) is the utility of production P, N(P) is the number of times P gets applied, t is the time difference from production application to the done action, and γ is the discount factor (which is set to 0.95 for our experiments).