Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning

Authors: Maxwell Nye, Michael Tessler, Josh Tenenbaum, Brenden M. Lake

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results in robust story generation and grounded instruction-following show that this approach can increase the coherence and accuracy of neurally-based generations.
Researcher Affiliation Collaboration Maxwell Nye MIT Michael Henry Tessler MIT Deep Mind Joshua B. Tenenbaum MIT Brenden M. Lake NYU Facebook AI Research
Pseudocode No The paper includes schematic diagrams (Figure 1, Figure 6) illustrating the system's flow but does not contain any formal pseudocode blocks or algorithm listings.
Open Source Code No The paper does not provide any explicit statements or links indicating that source code for the described methodology is available.
Open Datasets Yes We first illustrate the approach by generating short stories based on the bAbI dataset (Weston et al., 2015); this pedagogical, synthetic example illustrates how basic commonsense knowledge of objects, agents, and places can inform a text generation model. We then test our approach on rich, natural language vignettes based on CLUTRR (Sinha et al., 2019), focusing on ensuring consistency of family and interpersonal relationships. We use the gSCAN benchmark (Ruis et al., 2020), a recently proposed grounded instruction following dataset designed to measure compositional generalization in neural systems.
Dataset Splits No The paper mentions training data sizes for gSCAN ("5000 datapoints, 8000 datapoints, and 20000 datapoints") and refers to a "dev" split in Table 2, but it does not provide specific percentages or counts for training/validation/test splits, nor does it detail a cross-validation setup for full reproducibility.
Hardware Specification No The paper does not provide any specific details regarding the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions several models and tools used (e.g., GPT-3, BART, Z3 solver, RoBERTa MNLI), but it does not specify version numbers for these or other software dependencies, which are necessary for reproducibility.
Experiment Setup Yes For the bAbI examples, we use GPT-3 as our System 1 proposal model through few-shot prompting with 10 example bAbI stories as context, generating a new story one candidate sentence at a time. For all System 1 generations, we used model temperature of 1.0. For the neural NLI baseline, we used 0.9 probability of contradiction as the cutoff for rejection. Our dual-system model uses a sampling budget of 10 System 1 samples per sentence. In our experiments, we use a sample-based search with a maximum budget of 50 samples.