Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Data Source for Reasoning Embodied Agents
Authors: Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the results of several baseline models on instantiations of train sets. These include pre-trained language models fine-tuned on a text-formatted representation of the database, and graph-structured Transformers operating on a knowledgegraph representation of the database. |
| Researcher Affiliation | Industry | Meta AI EMAIL |
| Pseudocode | No | The paper includes mathematical equations for model operations and loss functions but does not provide any pseudocode or algorithm blocks. |
| Open Source Code | No | Code to generate the data and train the models will be released at github.com/facebookresearch/neuralmemory. |
| Open Datasets | No | In this work, to further pursue these advances, we introduce a new data generator for machine reasoning that integrates with an embodied agent. The generated data consists of templated text queries and answers, matched with world-states encoded into a database. ... We propose a context-question-answer data generator for embodied agents. ... With this data generation framework, we can create arbitrary amounts of simulated data. |
| Dataset Splits | Yes | Since we are generating the data, we vary the training samples from 1k to 1M, and use a validation set of 10k samples. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions using 'GPT-2 small model from the Hugging Face library' and 'Adam' optimizer, but it does not specify concrete version numbers for these or other software dependencies. |
| Experiment Setup | Yes | All our models are trained using Adam (Kingma and Ba 2014) for 5,000 epochs, where each epoch is over a chunk of 10,000 training samples. We use a linear warmup of 10,000 steps and cosine decay (Loshchilov and Hutter 2016). For the GPT2 model, we consider learning rates {1e-4, 5e-4, 1e-5} using a batch size of 32. For the structured model, we consider learning rates {1e-4, 5e-4, 1e-5}, batch size 32, layers {2, 3}, and embedding dimensions {256, 512}. |